赞
踩
我们经常使用spark时会对表合并
import spark.implicits._ import spark.implicits._ var data1 = Seq( | ("1", "ming", "hlj"), | ("2", "tian", "jl"), | ("3", "wang", "ln"), | ("4", "qi", "bj"), | ("5", "sun", "tj") | ).toDF("useid", "name", "live") var data2 = Seq( | ("1", "1000", "sing"), | ("3", "2000", "dance"), | ("5", "3000", "rap"), | ("7", "5000", "lanqiu"), | ("9", "8000", "zuqiu") | ).toDF("useid", "salary", "hobby") data1.show() data2.show()
将表data1和data2进行full join,生成新的useid
COALESCE ( expression,value1,value2……,valuen)
COALESCE()函数的第一个参数expression为待检测的表达式,而其后的参数个数不定。
COALESCE()函数将会返回包括expression在内的所有参数中的第一个非空表达式。
如果expression不为空值则返回expression;否则判断value1是否是空值,
如果value1不为空值则返回value1;否则判断value2是否是空值。
data1 = data1.withColumnRenamed(s"useid", s"useid_1")
data2 = data2.withColumnRenamed(s"useid", s"useid_2")
data1.createOrReplaceTempView(s"data1")
data2.createOrReplaceTempView(s"data2")
spark.sql(s"select" +
s" COALESCE(aa.useid_1,bb.useid_2) as useid,*" +
s" from data1 aa full join data2 bb on aa.useid_1=bb.useid_2"+
s"").drop("useid_1").drop("useid_2").show()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。