赞
踩
python版本是3.12
输入代码:
- from pyspark import SparkConf,SparkContext
- # 在PySpark中调用python解释器
- import os
- os.environ['PYSPARK_PYTHON'] = "D:/python/python.exe"
-
- # 创建SparkConf类对象
- conf = SparkConf().setMaster("local[*]").setAppName("test_spark_app")
- sc = SparkContext(conf=conf)
- # 打印版本
- print(sc.version)
- # 数据计算
- rdd1 = sc.parallelize([1,2,3,4,5])
- rdd3 = rdd1.map(lambda X: X * 10)
- print(rdd3.collect())
环境配置如下:
path路径配置:
错误代码如下:
D:\python\python.exe "D:\python工具\python学习工具\第二阶段\test pyspark.py"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
3.5.0
24/01/07 16:24:31 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator
24/01/07 16:24:32 ERROR TaskSetManager: Task 14 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
File "D:\python工具\python学习工具\第二阶段\test pyspark.py", line 21, in <module>
print(rdd3.collect())
^^^^^^^^^^^^^^
File "D:\python\Lib\site-packages\pyspark\rdd.py", line 1833, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\Lib\site-packages\py4j\java_gateway.py", line 1322, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "D:\python\Lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 14 in stage 0.0 failed 1 times, most recent failure: Lost task 14.0 in stage 0.0 (TID 14) (lxs010571022059.bdo.com.cn executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2844)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2780)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2779)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2779)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1242)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1242)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1242)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3048)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2982)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2971)
at org.apache.spark.util.EventLoop
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。