赞
踩
(1)基于文本文件创建RDD
val lines = sc.textFile("/home/test.txt")
(2)按空格拆分作扁平化映射
val words = lines.flatMap(_.split(" "))
(3)将单词数组映射成二元组数组
val tuplewords = words.map((_, 1))
(4)将二元组数组按键归约
val wordcount = tuplewords.reduceByKey(_ + _)
(5)将词频统计结果按次数降序排列
val sortwordcount = wordcount.sortBy(_._2, false)
sc.textFile("/home/test.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _).sortBy(_._2, false).collect.foreach(println)
(1)基于文本文件创建RDD
lines = sc.textFile("/home/test.txt")
(2)按空格拆分作扁平化映射
words = lines.flatMap(lambda line : line.split(' '))
(3)将单词数组映射成二元组数组
tuplewords = words.map(lambda word : (word, 1))
(4)将二元组数组按键归约
wordcount = tuplewords.reduceByKey(lambda a, b : a + b)
(5)将词频统计结果按次数降序排列
sortwordcount = wordcount.sortBy(lambda wc : wc[1], False)
for line in sc.textFile('/home/test.txt').flatMap(lambda line : line.split(' ')).map(lambda word : (word, 1)).reduceByKey(lambda a, b : a + b).sortBy(lambda tup : tup[1], False).collect():
print(line)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。