赞
踩
1. 下载安装 spark
2. 下载安装python
3. 创建环境变量 spark_home D:\Spark\spark-2.0.1-bin-hadoop2.6
4. 将路径D:\Spark\spark-2.0.1-bin-hadoop2.6\python\pyspark加入环境变量
5. 将D:\Spark\spark-2.0.1-bin-hadoop2.6 下的pyspark 文件夹拷贝到python安装路径下:D:\Python\Python35\Lib
6. 在python自带编译器中测试如下代码:
from pyspark import SparkContext
logFile = "F:\\testData\\test.txt"
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
7. 执行结果如下
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。