当前位置:   article > 正文

Spark入门--初学者

Spark入门--初学者
  • Step1:下载sparkhttp://mirrors.shu.edu.cn/apache/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
  • Step2:将下载好的spark通过命令传输到linux环境下(rz命令,最好创建一个file,放到file中 mkdir opt)
  • Step3:cd  /opt中通过命令解压压缩包
  • tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz
  • Step4:cd  /etc中,vi profile文件, 将如下代码复制到文件中(注:目录有出处请修改)
  • #Spark enviroment
    1. export SPARK_HOME=/opt/spark/spark-2.3.0-bin-hadoop2.7/
    2. export PATH="$SPARK_HOME/bin:$PATH"
  • Step5:在/opt/spark/中新建一个spark_file_test文件夹
  • mkdir spark_file_test
  • Step6:在/opt/spark/spark_file_test中创建一个文件
    touch hello_spark
  • Step7编辑hello_spark文件,输入一些测试数据
    1. vi hello_spark
    2. hello spark!
    3. hello spark!
    4. hello spark!
    5. hello spark!
  • Step8:回到cd  /opt/spark/spark-2.3.0-bin-hadoop2.7/bin目录中
  • Step9:输入spark-shell,出现下图中的info表示成功
    1. 2018-04-30 09:35:53 WARN Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.159.128 instead (on interface eth0)
    2. 2018-04-30 09:35:53 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
    3. 2018-04-30 09:35:57 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    4. Setting default log level to "WARN".Spark context Web UI available at http://192.168.159.128:4040
    5. Spark context available as 'sc' (master = local[*], app id = local-1524847005612).
    6. Spark session available as 'spark'.
    7. Welcome to
    8.       ____              __
    9.      / __/__  ___ _____/ /__
    10.     _\ \/ _ \/ _ `/ __/  '_/
    11.    /___/ .__/\_,_/_/ /_/\_\   version 2.3.0
    12.       /_/
    13.          
    14. Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_171)
    15. Type in expressions to have them evaluated.
    16. Type :help for more information
  • Step10:读取文件,返回一个RDD
    1. scala> var lines = sc.textFile("../../spark_file_test/hello_spark")
    2. 2018-04-27 09:40:53 WARN  SizeEstimator:66 - Failed to check whether UseCompressedOops is set; assuming yes
    3. lines: org.apache.spark.rdd.RDD[String] = ../../spark_file_test/hello_spark MapPartitionsRDD[1] at textFile at <console>:24
  • Step11:测试,读取文件的行数和第一行的信息
    1. scala> lines.count()
    2. res0: Long = 5
    3. scala> lines.first
    4. res1: String = Hello Spark!

    !!!!!!!!!!!!!!!!!!!!!!!!SUCCESSFUL!!!!!!!!!!!!!!!!!!!!!!!!



声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/682757
推荐阅读
相关标签
  

闽ICP备14008679号