赞
踩
目录
2、将文件上传至服务器解压,并对其dev下 make-distribution.sh做配置
3、对文件 make-distribution.sh 进行配置
1、将二编译好的spark放入我们的用户文件夹下的software下,并解压至app下,并重命名
3、配置spark下conf文件下的 spark-env.sh 文件
5、将hive的配置文件 hive-site.xml 拷贝到 spark下conf下
(3)查询了一下是由于YARN内存设置小导致 ,尝试将yarn比例调大一点
1、进入spark下conf下 复制一个配置文件 spark-defaults.conf
链接:Scala 2.12.15 | The Scala Programming Language
本地 Scala 部署完成
- [peizk@hadoop software]$ wget https://downloads.lightbend.com/scala/2.12.15/scala-2.12.15.tgz
- --2022-04-09 10:42:23-- https://downloads.lightbend.com/scala/2.12.15/scala-2.12.15.tgz
- Resolving downloads.lightbend.com (downloads.lightbend.com)... 13.225.173.82, 13.225.173.69, 13.225.173.49, ...
- Connecting to downloads.lightbend.com (downloads.lightbend.com)|13.225.173.82|:443... connected.
- HTTP request sent, awaiting response... 200 OK
- Length: 21087658 (20M) [application/octet-stream]
- Saving to: ‘scala-2.12.15.tgz’
-
- 100%[==========================================================================================================================>] 21,087,658 4.28MB/s in 5.6s
-
- 2022-04-09 10:42:29 (3.60 MB/s) - ‘scala-2.12.15.tgz’ saved [21087658/21087658]
- [peizk@hadoop software]$ tar -zxvf scala-2.12.15.tgz -C ~/app
内容如下
- #SCALA_HOME
- export SCALA_HOME=/home/peizk/app/scala-2.12.15
- export PATH=$PATH:$SCALA_HOME/bin
-
配置完成后source一下
[root@hadoop app]# source /etc/profile
- [root@hadoop bin]# ln -s /home/peizk/app/scala-2.12.15/bin/scala /usr/bin/scala
-
- [peizk@hadoop ~]$ scala -version
- Scala code runner version 2.12.15 -- Copyright 2002-2021, LAMP/EPFL and Lightbend, Inc.
部署成功
将版本信息注释掉
自己指定,如下
[root@hadoop dev]# ./change-scala-version.sh 2.12
[root@hadoop spark-3.2.1]# ./dev/make-distribution.sh --name 3.2.1-hadoop3.1.3 --tgz -Phive -Phive-thriftserver -Pyarn -Dhadoop.version=3.1.3 -Dscala.version=2.12.15
过程会比较慢一点
并且可在 家目录下找到编译好的文件
exec: curl --silent --show-error -L https://downloads.lightbend.com/scala/。。。。
在linux 配置 maven,并且maven镜像要使用阿里云镜像,并在 spark家目录下dev下文件make-distribution.sh如下处 指定maven路径,如下:
Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled
解决办法
指定mav内存,调大
在 /etc/profile 添加
export MAVEN_OPTS="-Xms1024m -Xmx1024m -Xss1m"
不要忘记source!!!!
[root@hadoop spark-3.2.1]# cp spark-3.2.1-bin-3.2.1-hadoop3.1.3.tgz /home/peizk/software
[peizk@hadoop software]$ tar -zxvf spark-3.2.1-bin-3.2.1-hadoop3.1.3.tgz -C ~/app
[peizk@hadoop app]$ mv spark-3.2.1-bin-3.2.1-hadoop3.1.3 spark-3.2.1
- #SPARK_HOME
- export SPARK_HOME=/home/peizk/app/spark-3.2.1
- export PATH=$PATH:$SPARK_HOME/bin
不要忘记source
[peizk@hadoop conf]$ cp spark-env.sh.template spark-env.sh
配置三个 conf_dir 如下:
- # Options read in YARN client/cluster mode
- # - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
- SPARK_CONF_DIR=${SPARK_HOME}/conf
- # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
- HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
- # - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
- YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
从 hive的lib下偷过来,如下
[peizk@hadoop lib]$ cp mysql-connector-java-5.1.47.jar ../../spark-3.2.1/jars/
[peizk@hadoop spark-3.2.1]$ cp ../hive-3.1.2/conf/hive-site.xml conf/
- bin/spark-submit \
- --class org.apache.spark.examples.SparkPi \
- --master yarn \
- /home/peizk/app/spark-3.2.1/examples/jars/spark-examples_2.12-3.2.1.jar 10
-
-
- [peizk@hadoop spark-3.2.1]$ bin/spark-submit \
- > --class org.apache.spark.examples.SparkPi \
- > --master yarn \
- > /home/peizk/app/spark-3.2.1/examples/jars/spark-examples_2.12-3.2.1.jar 10
报错大致如下
- Caused by: java.io.IOException: Connection reset by peer
- at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
- at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
- at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
- at sun.nio.ch.IOUtil.read(IOUtil.java:192)
- at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
- at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
- at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
- at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
- at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
- at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
- at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
- at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
- at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
- at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
- at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
- at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
- at java.lang.Thread.run(Thread.java:748)
修改文件
[peizk@hadoop hadoop]$ vim yarn-site.xml
添加如下:
- <property>
- <name>yarn.nodemanager.vmem-pmem-ratio</name>
- <value>4</value>
- </property>
重启启动下集群
执行成功!如下
spark配置成功!!
[peizk@hadoop conf]$ cp spark-defaults.conf.template spark-defaults.conf
修改两处地方 如下:
将1 中的地址修改为我们刚创建的的spark-log文件地址,如下:
添加如下:
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop:9000/spark-log"
[peizk@hadoop sbin]$ ./start-history-server.sh
ip:18080
启动成功!!
如下:
内容如下:
- [peizk@hadoop spark-3.2.1]$ cat spark-test.txt
- aa,bb,cc
- aa,c,cc
- aa
路径如下:
- [peizk@hadoop spark-3.2.1]$ hadoop fs -mkdir /spark-test
- [peizk@hadoop spark-3.2.1]$ hadoop fs -put spark-test.txt /spark-test/
[peizk@hadoop spark-3.2.1]$ spark-shell
- scala> sc.textFile("/spark-test/spark-test.txt").foreach(println)
- aa
- aa,bb,cc
- aa,c,cc
- scala> sc.textFile("/spark-test/spark-test.txt").flatMap(x => x.split(",")).map(x =>(x,1)).reduceByKey(_+_).foreach(println)
- (aa,3)
- (c,1)
- (bb,1)
- (cc,2)
sc.textFile("/spark-test/spark-test.txt").flatMap(_.split(",")).map((_, 1)).groupByKey().map(tuple => {(tuple._1, tuple._2.sum)}).collect().foreach(println)
Spark驱动器节点,用于执行Spark任务中的main方法,负责实际代码的执行工作。Driver在Spark作业执行时主要负责:
简单理解,所谓的Driver就是驱使整个应用运行起来的程序,也称之为Driver类。
Spark Executor是集群中工作节点(Worker)中的一个JVM进程,负责在 Spark 作业中运行具体任务(Task),任务彼此之间相互独立。Spark 应用启动时,Executor节点被同时启动,并且始终伴随着整个 Spark 应用的生命周期而存在。如果有Executor节点发生了故障或崩溃,Spark 应用也可以继续执行,会将出错节点上的任务调度到其他Executor节点上继续运行。
Executor有两个核心功能:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。