赞
踩
目录
在未配置Spark环境的Win10系统上使用PyCharm平台运行PySpark项目,但是已通过
pip install pyspark 安装了pyspark库,代码段无报错,但是运行时出现这种报错:
Spark找不到Python环境的位置,需要指定Python环境.
(1)如图所示,进入编辑运行配置:
(2)如图所示,点击编辑环境变量:
(3)如图所示,添加PYSPARK_PYTHON的环境变量:
(4)点击OK,点击Apply.再次运行项目:
报错已被解决.
该测试代码是一个简单的词频统计,一并发出来吧:
- import pyspark
- from pyspark import SparkConf
-
- # 单词统计
- def word_statistics(words):
- conf = pyspark.SparkConf().setMaster("local[*]").setAppName("Word_Statistics")
- sc = pyspark.SparkContext(conf=conf)
-
- words = words
- rdd = sc.parallelize(words)
- counts = rdd.map(lambda w: (w, 1)).reduceByKey(lambda a, b: a+b)
- print(counts.collect())
-
- if __name__ == "__main__":
- words = ["test1", "test2", "test1", "test2", "test3", "test2", "test1", "test5", "test4", "test2", "test6", "test7"]
- word_statistics(words)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。