赞
踩
弱弱问下,我设置了环境变量,但是运行就报错,是哪里错了啊? 求指点~~拜谢
问题解决了,是python版本问题,版本高了,我降了版本就正常了 谢谢大家
代码如下
from pyspark import SparkConf, SparkContext import os os.environ['PYSPARK_PYTHON'] = "E:/Python_code/pythonProject/.venv/Scripts/python.exe" # 创建SparkConf类对象 conf = SparkConf().setMaster("local[*]").setAppName("test_spark_app") # 基于SparkConf类对象创建SparkContext对象 sc = SparkContext(conf=conf) # 打印spark版本信息 # print(sc.version) rdd = sc.parallelize([1, 2, 3, 4]) print(rdd.collect()) print(rdd.map(lambda x: x * 10).collect()) # 停止SparkContext对象的运行 sc.stop()
报错信息:
24/02/22 09:36:38 ERROR Executor: Exception in task 5.0 in stage 1.0 (TID 25)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator
24/02/22 09:36:38 ERROR TaskSetManager: Task 5 in stage 1.0 failed 1 times; aborting job
Traceback (most recent call last):
File "E:\Python_code\pythonProject\spark_test.py", line 14, in <module>
print(rdd.map(lambda x: x * 10).collect())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Python_code\pythonProject\.venv\Lib\site-packages\pyspark\rdd.py", line 1833, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Python_code\pythonProject\.venv\Lib\site-packages\py4j\java_gateway.py", line 1322, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "E:\Python_code\pythonProject\.venv\Lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 1 times, most recent failure: Lost task 5.0 in stage 1.0 (TID 25) (GYG-LZC executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2844)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2780)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2779)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2779)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1242)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1242)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1242)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3048)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2982)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2971)
at org.apache.spark.util.EventLoop
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。