当前位置:   article > 正文

Spark On YARN环境配置_spark on yarn配置

spark on yarn配置

一、准备工作

点击查看Spark Standalone HA环境配置教程

二、修改配置文件

一、修改spark-env.sh

cd /export/server/spark/conf
vim /export/server/spark/conf/spark-env.sh
  • 1
  • 2
# 添加以下内容
HADOOP_CONF_DIR=/export/server/hadoop-3.3.0/etc/hadoop/
YARN_CONF_DIR=/export/server/hadoop-3.3.0/etc/hadoop/
  • 1
  • 2
  • 3

二、修改hadoop的yarn-site.xml

cd /export/server/spark/conf
scp -r spark-env.sh node2:$PWD
scp -r spark-env.sh node3:$PWD
  • 1
  • 2
  • 3
cd /export/server/hadoop-3.3.0/etc/hadoop/
vim /export/server/hadoop-3.3.0/etc/hadoop/yarn-site.xml
  • 1
  • 2
# 要修改的内容
<?xml version="1.0"?>

<!-- 新增加的代码 -->
<configuration>

<!-- Site specific YARN configuration properties -->

<!-- Site specific YARN configuration properties -->
<!-- 设置YARN集群主角色运行机器位置 -->
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node1</value>
</property>

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>


    <!-- 设置yarn集群的内存分配方案 -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>20480</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>2048</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>


<!-- 是否将对容器实施物理内存限制 -->
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<!-- 是否将对容器实施虚拟内存限制。 -->
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

<!-- 开启日志聚集 -->
<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>

<!-- 设置yarn历史服务器地址 -->
<property>
    <name>yarn.log.server.url</name>
    <value>http://node1:19888/jobhistory/logs</value>
</property>

<!-- 保存的时间7天 -->
<property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>604800</value>
</property>
</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
cd /export/server/hadoop-3.3.0/etc/hadoop
scp -r yarn-site.xml node2:$PWD
scp -r yarn-site.xml node3:$PWD
  • 1
  • 2
  • 3

三、Spark设置历史服务地址

cd /export/server/spark/conf
cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf
  • 1
  • 2
  • 3
# 添加以下内容:
spark.eventLog.enabled                  true
spark.eventLog.dir                      hdfs://node1:8020/sparklog/
spark.eventLog.compress                 true
spark.yarn.historyServer.address        node1:18080
  • 1
  • 2
  • 3
  • 4
  • 5
cd /export/server/spark/conf
cp log4j.properties.template log4j.properties
vim log4j.properties
  • 1
  • 2
  • 3
# 修改为以下内容
log4j.rootCategory=WARN, console
  • 1
  • 2
cd /export/server/spark/conf
scp -r spark-defaults.conf log4j.properties node2:$PWD
scp -r spark-defaults.conf log4j.properties node3:$PWD
  • 1
  • 2
  • 3

四、配置依赖spark jar包

hadoop fs -mkdir -p /spark/jars/
hadoop fs -put /export/server/spark/jars/* /spark/jars/
  • 1
  • 2
cd /export/server/spark/conf
vim spark-defaults.conf
  • 1
  • 2
# 添加以下内容:
spark.yarn.jars  hdfs://node1:8020/spark/jars/*
  • 1
  • 2
cd /export/server/spark/conf
scp -r spark-defaults.conf root@node2:$PWD
scp -r spark-defaults.conf root@node3:$PWD
  • 1
  • 2
  • 3

三、启动服务

start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
/export/server/spark/sbin/start-history-server.sh
  • 1
  • 2
  • 3
  • 4
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/一键难忘520/article/detail/866685
推荐阅读
相关标签
  

闽ICP备14008679号