赞
踩
通过追踪 kudu-spark.jar 的源码知道
kudu.batchSize: 默认为 20M batchSize Sets the maximum number of bytes returned by the scanner, on each batch.
splitSizeBytes sets the target number of bytes per spark task. If set, tablet’s primary key range will be split to generate uniform task sizes instead of the default of 1 task per tablet
调参为:
val sqlDF = spark.sqlContext.read.options(
Map("kudu.master" -> kuduMasters,
"kudu.table" -> kuduTableName,
//200M
"kudu.batchSize" -> "419430400",
//10G
"kudu.splitSizeBytes" -> "10737418240")).format("kudu").load.cache()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。