赞
踩
https://stackoverflow.com/questions/32435263/dataframe-join-optimization-broadcast-hash-join
import org.apache.spark.sql.functions.broadcast
// hiveContext.sql("SET spark.sql.autoBroadcastJoinThreshold = -1") // 不要加这句,这句其实是阻止broadcast
smallDataframe = smallDataframe.cache()
largeDataframe.join(broadcast(smallDataframe), ...)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。