当前位置:   article > 正文

pyspark==windows单机搭建

pyspark==windows单机搭建

下载安装JDK17,配置JAVA_HOME

下载安装hadoop-3.3.5并完整替换bin目录,配置HADOOP_HOME

Index of /hadoop/common/hadoop-3.3.5

GitHub - cdarlint/winutils: winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

下载spark配置SPARK_HOME

安装pyspark

Demo

遇到错误

org.apache.spark.SparkException: Python worker failed to connect back.

注意要指定python的地址

  1. from pyspark.sql import SparkSession
  2. import time
  3. # 创建SparkSession
  4. spark = SparkSession.builder.appName("CSV to DataFrame").getOrCreate()
  5. # 读取CSV文件到DataFrame
  6. csv_file_path = "../large_test_file.csv" # 替换为你的CSV文件路径
  7. df = spark.read.csv(csv_file_path, header=True, inferSchema=True)
  8. # 注册临时表以进行SQL查询
  9. df.createOrReplaceTempView("csv_table")
  10. start_time = time.time()
  11. # 使用Spark SQL查询数据
  12. sql_query = """
  13. SELECT max(col_18) as final FROM csv_table
  14. """
  15. result_df = spark.sql(sql_query)
  16. # 显示查询结果
  17. result_df.show()
  18. print(f"datetime 模块测量时间: {time.time() - start_time}")
  19. # datetime 模块测量时间: 0.9699978828430176
  20. # 停止SparkSession
  21. spark.stop()

环境

python3.10

  1. annotated-types==0.7.0
  2. anyio==4.4.0
  3. certifi==2024.2.2
  4. click==8.1.7
  5. cloudpickle==3.0.0
  6. colorama==0.4.6
  7. dask==2024.1.1
  8. dask_sql==2024.3.0
  9. distributed==2024.1.1
  10. dnspython==2.6.1
  11. email_validator==2.1.1
  12. exceptiongroup==1.2.1
  13. fastapi==0.111.0
  14. fastapi-cli==0.0.4
  15. fsspec==2024.5.0
  16. h11==0.14.0
  17. httpcore==1.0.5
  18. httptools==0.6.1
  19. httpx==0.27.0
  20. idna==3.7
  21. importlib_metadata==7.1.0
  22. Jinja2==3.1.4
  23. locket==1.0.0
  24. markdown-it-py==3.0.0
  25. MarkupSafe==2.1.5
  26. mdurl==0.1.2
  27. msgpack==1.0.8
  28. numpy==1.26.4
  29. orjson==3.10.3
  30. packaging==24.0
  31. pandas==2.2.2
  32. partd==1.4.2
  33. prompt_toolkit==3.0.45
  34. psutil==5.9.8
  35. py4j==0.10.9.7
  36. pydantic==2.7.1
  37. pydantic_core==2.18.2
  38. Pygments==2.18.0
  39. pyspark==3.5.1
  40. python-dateutil==2.9.0.post0
  41. python-dotenv==1.0.1
  42. python-multipart==0.0.9
  43. pytz==2024.1
  44. PyYAML==6.0.1
  45. rich==13.7.1
  46. shellingham==1.5.4
  47. six==1.16.0
  48. sniffio==1.3.1
  49. sortedcontainers==2.4.0
  50. starlette==0.37.2
  51. tabulate==0.9.0
  52. tblib==3.0.0
  53. toolz==0.12.1
  54. tornado==6.4
  55. typer==0.12.3
  56. typing_extensions==4.12.0
  57. tzdata==2024.1
  58. tzlocal==5.2
  59. ujson==5.10.0
  60. urllib3==2.2.1
  61. uvicorn==0.30.0
  62. watchfiles==0.22.0
  63. wcwidth==0.2.13
  64. websockets==12.0
  65. zict==3.0.0
  66. zipp==3.19.0

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/663330
推荐阅读
相关标签
  

闽ICP备14008679号