赞
踩
KNN算法原理可以参考:数据挖掘笔记-分类-KNN-1
基于Spark简单实现算法代码如下:
object SparkKNN extends Serializable {
def main(args: Array[String]) {
if (args.length != 4) {
println("error, please input three path.");
println("1 train set path.");
println("2 test set path.");
println("3 output path.");
println("4 k value.");
System.exit(1)
}
val sc = new SparkContext("spark://centos.host1:7077", "Spark KNN")
val trainSet = sc.textFile(args(0)).map(line => {
var datas = line.split(" ")
(datas(0), datas(1), datas(2))
})
var bcTrainSet = sc.broadcast(trainSet.collect())
var bcK = sc.broadcast(args(3).toInt)
val testSet = sc.textFi
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。