当前位置:   article > 正文

Spark 分布式环境搭建_spark完全分布式环境搭建

spark完全分布式环境搭建

Spark 分布式环境搭建

1. scala环境搭建

1)下载scala安装包scala2.12.10.tgz安装到 /usr/scala

[root@hadoop001 scala]# tar -zxvf scala-2.12.10.tgz
[root@hadoop001 scala]# ln -s scala-2.12.10.tgz scala
  • 1
  • 2

2)添加Scala环境变量,在/etc/profile中添加:

export SCALA_HOME=/usr/scala/scala
export PATH=$SCALA_HOME/bin:$PATH
  • 1
  • 2

3)保存后刷新

[root@hadoop001 scala]:~# source /etc/profile
  • 1

4)使用scala -version命令确认

[root@hadoop001 scala]# scala -version
  • 1

2. Spark安装

2.1 解压

[hadoop@hadoop001 software]$ tar -zxvf spark-2.4.6-bin-2.6.0-cdh5.16.2.tgz -C ~/app/
  • 1

软连接

[hadoop@hadoop001 app]$ ln -s spark-2.4.6-bin-2.6.0-cdh5.16.2/ spark
  • 1

2.2 修改环境配置文件

[hadoop@hadoop001 app]$ vi /home/hadoop/.bashrc
  • 1

#spark

export SPARK_HOME=/home/hadoop/app/spark
export PATH=$PATH:$SPARK_HOME/bin
  • 1
  • 2

----------------------------------------local部署模式 spark-env.sh

[hadoop@hadoop001 conf]$ cp spark-env.sh.template spark-env.sh


 export JAVA_HOME=/usr/java/jdk
 export SCALA_HOME=/usr/scala/scala
 export SPARK_WORKER_MEMORY=1g
 export SPARK_WORKER_CORES=2
 export SPARK_HOME=/data/app/spark
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH/lib/hadoop
export YARN_CONF_DIR=/opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

----------------------------------------on yarn部署模式 – spark-default.conf

spark.eventLog.enabled          true
spark.eventLog.dir               hdfs://node-247:8020/user/spark/directory

  • 1
  • 2
  • 3

将jar包传到hdfs上

hdfs dfs -put /data/app/spark/jars/*
  • 1

2.3 修改slaves

[hadoop@hadoop001 conf]$ mv slaves.template slaves
[hadoop@hadoop001 conf]$ vim slaves
删除localhost
hadoop001
hadoop002
hadoop003
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

2.4 配置hadoop002 hadoop003 的配置文件

#spark
export SPARK_HOME=/home/hadoop/app/spark
export PATH=$PATH:$SPARK_HOME/bin

source .bashrc
  • 1
  • 2
  • 3
  • 4
  • 5

2.5 scp到hadoop002 hadoop003

[hadoop@hadoop001 ~]$ scp -r /home/hadoop/app/spark-2.4.6-bin-2.6.0-cdh5.16.2 hadoop002:/home/hadoop/app/
软连接
[hadoop@hadoop003 app]$ ln -s spark-2.4.6-bin-2.6.0-cdh5.16.2/ spark
  • 1
  • 2
  • 3

2.6 配置hadoop002 hadoop003 spark 的配置文件

[hadoop@hadoop002 conf]$ pwd
/home/hadoop/app/spark/conf
[hadoop@hadoop002 conf]$ vim spark-env.sh
配置成他们自己的ip


export SPARK_LOCAL_IP=192.168.1.183
export SPARK_LOCAL_IP=192.168.1.175
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

3. Scala分发

[root@hadoop001 usr]# scp -r /usr/scala/ hadoop002:/usr/

[root@hadoop001 usr]# scp -r /usr/scala/ hadoop003:/usr/

[root@hadoop001 usr]# scp /etc/profile hadoop002:/etc/
profile                                                                                                                   100% 2016   890.7KB/s   00:00
[root@hadoop001 usr]# scp /etc/profile hadoop003:/etc/
profile       

[root@hadoop002 ~]# source /etc/profile            
[root@hadoop003 ~]# source /etc/profile  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

4. 启动

[hadoop@hadoop001 spark]$ sbin/start-all.sh     
  • 1

可以去hadoop001:8081 查看
也可以
spark-shell --master yarn
启动
去yarn hadoop001:7776查看

可以直接测试

spark-submit --master yarn --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.1.1.jar
  • 1

Spark IDEA 配置

官网查看spark版本与scala版本相匹配的版本

idea创建spark module 然后配置pom文件

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>2.4.5</version>
    </dependency>
</dependencies>
<build>
    <plugins>
        <!-- 该插件用于将Scala代码编译成class文件 -->
        <plugin>
            <groupId>net.alchim31.maven</groupId>
            <artifactId>scala-maven-plugin</artifactId>
            <version>3.2.2</version>
            <executions>
                <execution>
                    <!-- 声明绑定到maven的compile阶段 -->
                    <goals>
                        <goal>testCompile</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.0.0</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44

import之后下载安装scala

https://www.scala-lang.org/download/
  • 1

然后在idea的setting里下载scala插件
打开Setting 里的Plugins 搜索scala 然后下载

如果提示安装不成功 选择本地安装 开启vpn下载更快

https://plugins.jetbrains.com/plugin/1347-scala
  • 1

在setting的右上角选择 设置纽 install from disk

选好与idea 想匹配的版本

然后配置scala的jdk

ctrl+shift+alt + S
  • 1

打开Project structure
然后配置Global Libraries里的scala jdk

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/507084
推荐阅读
相关标签
  

闽ICP备14008679号