赞
踩
计算两个数据序列之间的相关性是统计学中的一项常见操作。在spark.ml中,提供了计算许多成对序列之间相关性的操作,目前支持的相关方法是Pearson和Spearman。
val seriesX: RDD[Double] = sc.parallelize(Array(1, 2, 3, 3, 5)) // a series
// must have the same number of partitions and cardinality as seriesX
val seriesY: RDD[Double] = sc.parallelize(Array(11, 22, 33, 33, 555))
// compute the correlation using Pearson's method. Enter "spearman" for Spearman's method. If a
// method is not specified, Pearson's method will be used by default.
val correlation: Double = Statistics.corr(seriesX, seriesY, "pearson")
println(s"Correlation is: $correlation")
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。