赞
踩
目录
数据iris.txt以鸢尾花的特征作为数据来源,(数据集包含150个数据集,分为3类,每类50个数据,本节聚类实验,只保留了4个属性的值,类别值被丢弃)目的是通过使用MLlib程序库中的聚类算法(K-Means )来对数据(鸢尾花)进行分类
数据集如下:(直接复制粘贴存为iris.txt即可)
- 5.1,3.5,1.4,0.2,Iris-setosa
- 4.9,3.0,1.4,0.2,Iris-setosa
- 4.7,3.2,1.3,0.2,Iris-setosa
- 4.6,3.1,1.5,0.2,Iris-setosa
- 5.0,3.6,1.4,0.2,Iris-setosa
- 5.4,3.9,1.7,0.4,Iris-setosa
- 4.6,3.4,1.4,0.3,Iris-setosa
- 5.0,3.4,1.5,0.2,Iris-setosa
- 4.4,2.9,1.4,0.2,Iris-setosa
- 4.9,3.1,1.5,0.1,Iris-setosa
- 5.4,3.7,1.5,0.2,Iris-setosa
- 4.8,3.4,1.6,0.2,Iris-setosa
- 4.8,3.0,1.4,0.1,Iris-setosa
- 4.3,3.0,1.1,0.1,Iris-setosa
- 5.8,4.0,1.2,0.2,Iris-setosa
- 5.7,4.4,1.5,0.4,Iris-setosa
- 5.4,3.9,1.3,0.4,Iris-setosa
- 5.1,3.5,1.4,0.3,Iris-setosa
- 5.7,3.8,1.7,0.3,Iris-setosa
- 5.1,3.8,1.5,0.3,Iris-setosa
- 5.4,3.4,1.7,0.2,Iris-setosa
- 5.1,3.7,1.5,0.4,Iris-setosa
- 4.6,3.6,1.0,0.2,Iris-setosa
- 5.1,3.3,1.7,0.5,Iris-setosa
- 4.8,3.4,1.9,0.2,Iris-setosa
- 5.0,3.0,1.6,0.2,Iris-setosa
- 5.0,3.4,1.6,0.4,Iris-setosa
- 5.2,3.5,1.5,0.2,Iris-setosa
- 5.2,3.4,1.4,0.2,Iris-setosa
- 4.7,3.2,1.6,0.2,Iris-setosa
- 4.8,3.1,1.6,0.2,Iris-setosa
- 5.4,3.4,1.5,0.4,Iris-setosa
- 5.2,4.1,1.5,0.1,Iris-setosa
- 5.5,4.2,1.4,0.2,Iris-setosa
- 4.9,3.1,1.5,0.1,Iris-setosa
- 5.0,3.2,1.2,0.2,Iris-setosa
- 5.5,3.5,1.3,0.2,Iris-setosa
- 4.9,3.1,1.5,0.1,Iris-setosa
- 4.4,3.0,1.3,0.2,Iris-setosa
- 5.1,3.4,1.5,0.2,Iris-setosa
- 5.0,3.5,1.3,0.3,Iris-setosa
- 4.5,2.3,1.3,0.3,Iris-setosa
- 4.4,3.2,1.3,0.2,Iris-setosa
- 5.0,3.5,1.6,0.6,Iris-setosa
- 5.1,3.8,1.9,0.4,Iris-setosa
- 4.8,3.0,1.4,0.3,Iris-setosa
- 5.1,3.8,1.6,0.2,Iris-setosa
- 4.6,3.2,1.4,0.2,Iris-setosa
- 5.3,3.7,1.5,0.2,Iris-setosa
- 5.0,3.3,1.4,0.2,Iris-setosa
- 7.0,3.2,4.7,1.4,Iris-versicolor
- 6.4,3.2,4.5,1.5,Iris-versicolor
- 6.9,3.1,4.9,1.5,Iris-versicolor
- 5.5,2.3,4.0,1.3,Iris-versicolor
- 6.5,2.8,4.6,1.5,Iris-versicolor
- 5.7,2.8,4.5,1.3,Iris-versicolor
- 6.3,3.3,4.7,1.6,Iris-versicolor
- 4.9,2.4,3.3,1.0,Iris-versicolor
- 6.6,2.9,4.6,1.3,Iris-versicolor
- 5.2,2.7,3.9,1.4,Iris-versicolor
- 5.0,2.0,3.5,1.0,Iris-versicolor
- 5.9,3.0,4.2,1.5,Iris-versicolor
- 6.0,2.2,4.0,1.0,Iris-versicolor
- 6.1,2.9,4.7,1.4,Iris-versicolor
- 5.6,2.9,3.6,1.3,Iris-versicolor
- 6.7,3.1,4.4,1.4,Iris-versicolor
- 5.6,3.0,4.5,1.5,Iris-versicolor
- 5.8,2.7,4.1,1.0,Iris-versicolor
- 6.2,2.2,4.5,1.5,Iris-versicolor
- 5.6,2.5,3.9,1.1,Iris-versicolor
- 5.9,3.2,4.8,1.8,Iris-versicolor
- 6.1,2.8,4.0,1.3,Iris-versicolor
- 6.3,2.5,4.9,1.5,Iris-versicolor
- 6.1,2.8,4.7,1.2,Iris-versicolor
- 6.4,2.9,4.3,1.3,Iris-versicolor
- 6.6,3.0,4.4,1.4,Iris-versicolor
- 6.8,2.8,4.8,1.4,Iris-versicolor
- 6.7,3.0,5.0,1.7,Iris-versicolor
- 6.0,2.9,4.5,1.5,Iris-versicolor
- 5.7,2.6,3.5,1.0,Iris-versicolor
- 5.5,2.4,3.8,1.1,Iris-versicolor
- 5.5,2.4,3.7,1.0,Iris-versicolor
- 5.8,2.7,3.9,1.2,Iris-versicolor
- 6.0,2.7,5.1,1.6,Iris-versicolor
- 5.4,3.0,4.5,1.5,Iris-versicolor
- 6.0,3.4,4.5,1.6,Iris-versicolor
- 6.7,3.1,4.7,1.5,Iris-versicolor
- 6.3,2.3,4.4,1.3,Iris-versicolor
- 5.6,3.0,4.1,1.3,Iris-versicolor
- 5.5,2.5,4.0,1.3,Iris-versicolor
- 5.5,2.6,4.4,1.2,Iris-versicolor
- 6.1,3.0,4.6,1.4,Iris-versicolor
- 5.8,2.6,4.0,1.2,Iris-versicolor
- 5.0,2.3,3.3,1.0,Iris-versicolor
- 5.6,2.7,4.2,1.3,Iris-versicolor
- 5.7,3.0,4.2,1.2,Iris-versicolor
- 5.7,2.9,4.2,1.3,Iris-versicolor
- 6.2,2.9,4.3,1.3,Iris-versicolor
- 5.1,2.5,3.0,1.1,Iris-versicolor
- 5.7,2.8,4.1,1.3,Iris-versicolor
- 6.3,3.3,6.0,2.5,Iris-virginica
- 5.8,2.7,5.1,1.9,Iris-virginica
- 7.1,3.0,5.9,2.1,Iris-virginica
- 6.3,2.9,5.6,1.8,Iris-virginica
- 6.5,3.0,5.8,2.2,Iris-virginica
- 7.6,3.0,6.6,2.1,Iris-virginica
- 4.9,2.5,4.5,1.7,Iris-virginica
- 7.3,2.9,6.3,1.8,Iris-virginica
- 6.7,2.5,5.8,1.8,Iris-virginica
- 7.2,3.6,6.1,2.5,Iris-virginica
- 6.5,3.2,5.1,2.0,Iris-virginica
- 6.4,2.7,5.3,1.9,Iris-virginica
- 6.8,3.0,5.5,2.1,Iris-virginica
- 5.7,2.5,5.0,2.0,Iris-virginica
- 5.8,2.8,5.1,2.4,Iris-virginica
- 6.4,3.2,5.3,2.3,Iris-virginica
- 6.5,3.0,5.5,1.8,Iris-virginica
- 7.7,3.8,6.7,2.2,Iris-virginica
- 7.7,2.6,6.9,2.3,Iris-virginica
- 6.0,2.2,5.0,1.5,Iris-virginica
- 6.9,3.2,5.7,2.3,Iris-virginica
- 5.6,2.8,4.9,2.0,Iris-virginica
- 7.7,2.8,6.7,2.0,Iris-virginica
- 6.3,2.7,4.9,1.8,Iris-virginica
- 6.7,3.3,5.7,2.1,Iris-virginica
- 7.2,3.2,6.0,1.8,Iris-virginica
- 6.2,2.8,4.8,1.8,Iris-virginica
- 6.1,3.0,4.9,1.8,Iris-virginica
- 6.4,2.8,5.6,2.1,Iris-virginica
- 7.2,3.0,5.8,1.6,Iris-virginica
- 7.4,2.8,6.1,1.9,Iris-virginica
- 7.9,3.8,6.4,2.0,Iris-virginica
- 6.4,2.8,5.6,2.2,Iris-virginica
- 6.3,2.8,5.1,1.5,Iris-virginica
- 6.1,2.6,5.6,1.4,Iris-virginica
- 7.7,3.0,6.1,2.3,Iris-virginica
- 6.3,3.4,5.6,2.4,Iris-virginica
- 6.4,3.1,5.5,1.8,Iris-virginica
- 6.0,3.0,4.8,1.8,Iris-virginica
- 6.9,3.1,5.4,2.1,Iris-virginica
- 6.7,3.1,5.6,2.4,Iris-virginica
- 6.9,3.1,5.1,2.3,Iris-virginica
- 5.8,2.7,5.1,1.9,Iris-virginica
- 6.8,3.2,5.9,2.3,Iris-virginica
- 6.7,3.3,5.7,2.5,Iris-virginica
- 6.7,3.0,5.2,2.3,Iris-virginica
- 6.3,2.5,5.0,1.9,Iris-virginica
- 6.5,3.0,5.2,2.0,Iris-virginica
- 6.2,3.4,5.4,2.3,Iris-virginica
1)命令行开启spark shell
2)导入必要的包
3)读入文件,装载数据:通过SparkContext自带的textFile(..)方法将文件读入,并进行转换,形成一个RDD。
对RDD使用filter算子,并通过正则表达式将鸢尾花的类标签过滤掉,然后查看数据的情况 。
4)将数据集聚类,2个类,5次迭代,进行模型训练形成数据模型
5)打印数据模型的中心点
6)通过predict()方法来确定每个样本所属的聚类
7)使用误差平方之和来评估数据模型(度量聚类的有效性)
8)使用模型测试单点数据
9) 退出
通过使用GraphX来构建航班飞行网图,统计航班飞行网图中机场与航线的数量,计算最长的飞行航线,找出最繁忙的机场
数据集如下:
提取链接:https://pan.baidu.com/s/1bW-mwDwN6sDm4s6KGCytKA
提取码:21g4
1) 导入包
2)装载CSV为RDD,每个机场作为顶点,飞行距离是边 初始化顶点集airport:RDD[(VertexId,String)],顶点属性为机场名称 初始化边集lines:RDD[Edge],边属性为飞行距离
3) 进行图分析:统计航班飞行网图中机场与航线的数量
4)计算最长的飞行航线
5)找出最繁忙的机场,哪个机场到达航班最多
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。