conf = sp">
赞
踩
spark运行环境参考:https://blog.csdn.net/max_cola/article/details/78902597
对应的环境变量:
- #java
- export JAVA_HOME=/usr/local/jdk1.8.0_181
- export PATH=$JAVA_HOME/bin:$PATH
- #python
- export PYTHON_HOME=/usr/local/python3
- export PATH=$PYTHON_HOME/bin:$PATH
- #spark
- export SPARK_HOME=/usr/local/spark export PATH=$SPARK_HOME/bin:$PATH
- #add spark to python
- export PYTHONPATH=/usr/local/spark/python
- #add pyspark to jupyter
- export PYSPARK_PYTHON=/usr/local/python3/bin/python3 # 因为我们装了两个版本的python,所以要指定pyspark_python,>否则pyspark执行程序会报错。
- export PYSPARK_DRIVER_PYTHON=jupyter
- export PYSPARK_DRIVER_PYTHON_OPTS='notebook --allow-root'
使用 python写的Spark示例:
- # -*- coding: utf-8 -*-
- from __future__ import print_function
- from pyspark import *
- import os
- if __name__ == '__main__':
- sc = SparkContext("local[4]")
- sc.setLogLevel("WARN")
- rdd = sc.parallelize("hello Pyspark world".split(" "))
- counts = rdd \
- .flatMap(lambda line: line) \
- .map(lambda word: (word, 1)) \
- .reduceByKey(lambda a, b: a + b) \
- .foreach(print)
- sc.stop
出现如下错误
- Traceback (most recent call last):
- File "test1.py", line 3, in <module>
- from pyspark import *
- File "/usr/local/spark/python/pyspark/__init__.py", line 46, in <module>
- from pyspark.context import SparkContext
- File "/usr/local/spark/python/pyspark/context.py", line 29, in <module>
- from py4j.protocol import Py4JError
- ImportError: No module named py4j.protocol
解决方法:
- #进入python的目录
- /usr/local/python3/lib/python3.6/site-packages
-
- #拷贝日志包过来
- cp /usr/local/spark/python/lib/py4j-0.10.7-src.zip ./
-
- #解压
- unzip py4j-0.10.7-src.zip
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。