当前位置:   article > 正文

Apache Hadoop 3.x 版本的安装和配置_hadoop3.x版本yarn官方配置

hadoop3.x版本yarn官方配置

目录

0. 相关文章链接

1. Hadoop部署

1.1. 集群部署规划

1.2. 上传安装包到opt目录下面的software文件夹下面

1.3. 解压安装包

1.4. 配置Hadoop环境变量

2. 配置集群

2.1. 核心配置文件

2.2. HDFS配置文件

2.3. YARN配置文件

2.4. MapReduce配置文件

2.5. 配置workers

3. 配置历史服务器

4. 配置日志的聚集

5. 分发Hadoop

6. 群起集群

6.1. 格式化NameNode

6.2. 启动HDFS

6.3. 启动YARN

6.4. 修复启动HDFS和YARN报错问题

7. 配置集群一键脚本


0. 相关文章链接

大数据基础知识点 文章汇总

1. Hadoop部署

1.1. 集群部署规划

注意:NameNode和SecondaryNameNode不要安装在同一台服务器
注意:ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。

bigdata1bigdata2bigdata3
HDFS

NameNode

DataNode

DataNode

SecondaryNameNode

DataNode

YARNNodeManager

ResourceManager

NodeManager

NodeManager

1.2. 上传安装包到opt目录下面的software文件夹下面

1.3. 解压安装包

  1. # 解压安装包
  2. cd /opt/software/
  3. tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
  4. # 配置软连接
  5. ln -s /opt/module/hadoop-3.1.3/ /opt/module/hadoop

1.4. 配置Hadoop环境变量

编辑  /etc/profile 文件,在其中添加hadoop环境变量,并对环境变量文件进行分发

  1. # 编辑 /etc/profile
  2. vim /etc/profile
  3. # 添加如下配置
  4. ## HADOOP_HOME
  5. export HADOOP_HOME=/opt/module/hadoop
  6. export PATH=$PATH:$HADOOP_HOME/bin
  7. export PATH=$PATH:$HADOOP_HOME/sbin
  8. # 保存后退出
  9. :wq
  10. # 分发环境变量文件
  11. xsync /etc/profile
  12. # 在所有机器下source环境变量
  13. source /etc/profile

2. 配置集群

2.1. 核心配置文件

配置core-site.xml

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4. Licensed under the Apache License, Version 2.0 (the "License");
  5. you may not use this file except in compliance with the License.
  6. You may obtain a copy of the License at
  7. http://www.apache.org/licenses/LICENSE-2.0
  8. Unless required by applicable law or agreed to in writing, software
  9. distributed under the License is distributed on an "AS IS" BASIS,
  10. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11. See the License for the specific language governing permissions and
  12. limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16. <!-- 指定NameNode的地址 -->
  17. <property>
  18. <name>fs.defaultFS</name>
  19. <value>hdfs://bigdata1:8020</value>
  20. </property>
  21. <!-- 指定hadoop数据的存储目录 -->
  22. <property>
  23. <name>hadoop.tmp.dir</name>
  24. <value>/opt/module/hadoop/data</value>
  25. </property>
  26. <!-- 配置HDFS网页登录使用的静态用户为root -->
  27. <property>
  28. <name>hadoop.http.staticuser.user</name>
  29. <value>root</value>
  30. </property>
  31. </configuration>

2.2. HDFS配置文件

配置hdfs-site.xml

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4. Licensed under the Apache License, Version 2.0 (the "License");
  5. you may not use this file except in compliance with the License.
  6. You may obtain a copy of the License at
  7. http://www.apache.org/licenses/LICENSE-2.0
  8. Unless required by applicable law or agreed to in writing, software
  9. distributed under the License is distributed on an "AS IS" BASIS,
  10. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11. See the License for the specific language governing permissions and
  12. limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16. <!-- nn web端访问地址-->
  17. <property>
  18. <name>dfs.namenode.http-address</name>
  19. <value>bigdata1:9870</value>
  20. </property>
  21. <!-- 2nn web端访问地址-->
  22. <property>
  23. <name>dfs.namenode.secondary.http-address</name>
  24. <value>bigdata3:9868</value>
  25. </property>
  26. <!-- 测试环境指定HDFS副本的数量1 -->
  27. <property>
  28. <name>dfs.replication</name>
  29. <value>1</value>
  30. </property>
  31. </configuration>

2.3. YARN配置文件

配置yarn-site.xml

  1. <?xml version="1.0"?>
  2. <!--
  3. Licensed under the Apache License, Version 2.0 (the "License");
  4. you may not use this file except in compliance with the License.
  5. You may obtain a copy of the License at
  6. http://www.apache.org/licenses/LICENSE-2.0
  7. Unless required by applicable law or agreed to in writing, software
  8. distributed under the License is distributed on an "AS IS" BASIS,
  9. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. See the License for the specific language governing permissions and
  11. limitations under the License. See accompanying LICENSE file.
  12. -->
  13. <configuration>
  14. <!-- Site specific YARN configuration properties -->
  15. <!-- 指定MR走shuffle -->
  16. <property>
  17. <name>yarn.nodemanager.aux-services</name>
  18. <value>mapreduce_shuffle</value>
  19. </property>
  20. <!-- 指定ResourceManager的地址-->
  21. <property>
  22. <name>yarn.resourcemanager.hostname</name>
  23. <value>bigdata2</value>
  24. </property>
  25. <!-- 环境变量的继承 -->
  26. <property>
  27. <name>yarn.nodemanager.env-whitelist</name>
  28. <value>
  29. JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
  30. </value>
  31. </property>
  32. <!-- yarn容器允许分配的最大最小内存 -->
  33. <property>
  34. <name>yarn.scheduler.minimum-allocation-mb</name>
  35. <value>512</value>
  36. </property>
  37. <property>
  38. <name>yarn.scheduler.maximum-allocation-mb</name>
  39. <value>4096</value>
  40. </property>
  41. <!-- yarn容器允许管理的物理内存大小 -->
  42. <property>
  43. <name>yarn.nodemanager.resource.memory-mb</name>
  44. <value>4096</value>
  45. </property>
  46. <!-- 关闭yarn对虚拟内存的限制检查 -->
  47. <property>
  48. <name>yarn.nodemanager.vmem-check-enabled</name>
  49. <value>false</value>
  50. </property>
  51. </configuration>

2.4. MapReduce配置文件

配置mapred-site.xml

  1. <?xml version="1.0"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4. Licensed under the Apache License, Version 2.0 (the "License");
  5. you may not use this file except in compliance with the License.
  6. You may obtain a copy of the License at
  7. http://www.apache.org/licenses/LICENSE-2.0
  8. Unless required by applicable law or agreed to in writing, software
  9. distributed under the License is distributed on an "AS IS" BASIS,
  10. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11. See the License for the specific language governing permissions and
  12. limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16. <!-- 指定MapReduce程序运行在Yarn上 -->
  17. <property>
  18. <name>mapreduce.framework.name</name>
  19. <value>yarn</value>
  20. </property>
  21. </configuration>

2.5. 配置workers

配置workers

  1. bigdata1
  2. bigdata2
  3. bigdata3

3. 配置历史服务器

在mapred-site.xml配置文件中增加如下配置

  1. <!-- 历史服务器端地址 -->
  2. <property>
  3. <name>mapreduce.jobhistory.address</name>
  4. <value>bigdata1:10020</value>
  5. </property>
  6. <!-- 历史服务器web端地址 -->
  7. <property>
  8. <name>mapreduce.jobhistory.webapp.address</name>
  9. <value>bigdata1:19888</value>
  10. </property>

4. 配置日志的聚集

日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。

开启日志聚集功能具体步骤如下,在yarn-site.xml文件中增加如下配置:

  1. <!-- 开启日志聚集功能 -->
  2. <property>
  3. <name>yarn.log-aggregation-enable</name>
  4. <value>true</value>
  5. </property>
  6. <!-- 设置日志聚集服务器地址 -->
  7. <property>
  8. <name>yarn.log.server.url</name>
  9. <value>http://bigdata1:19888/jobhistory/logs</value>
  10. </property>
  11. <!-- 设置日志保留时间为7天 -->
  12. <property>
  13. <name>yarn.log-aggregation.retain-seconds</name>
  14. <value>604800</value>
  15. </property>

5. 分发Hadoop

将Hadoop集群通过脚本分发到其他机器上,分发脚本在博主的另一篇博客中有介绍:基于Centos7的集群分发脚本xsync

注意:执行脚本之前,要保证其他机器上有 /opt/module 目录
 

  1. # hadoop文件和软连接都要发送过去(在 /opt/module 目录下执行如下命令)
  2. xsync hadoop
  3. xsync hadoop-3.1.3/

6. 群起集群

6.1. 格式化NameNode

        如果集群是第一次启动,需要在hadoop102节点格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据

  1. # 因为配置了环境变量,执行如下命令即可(要在bigdata1机器上执行)
  2. hdfs namenode -format

6.2. 启动HDFS

执行 start-dfs.sh 命令即可

但是会发现会报如下错误

  1. [root@bigdata1 module]# start-dfs.sh
  2. Starting namenodes on [bigdata1]
  3. ERROR: Attempting to operate on hdfs namenode as root
  4. ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
  5. Starting datanodes
  6. ERROR: Attempting to operate on hdfs datanode as root
  7. ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
  8. Starting secondary namenodes [bigdata3]
  9. ERROR: Attempting to operate on hdfs secondarynamenode as root
  10. ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

此时可以在 start-dfs.sh,stop-dfs.sh(在hadoop安装目录的sbin里)两个文件顶部添加以下参数

  1. HDFS_DATANODE_USER=root
  2. HADOOP_SECURE_DN_USER=hdfs
  3. HDFS_NAMENODE_USER=root
  4. HDFS_SECONDARYNAMENODE_USER=root

如下图:

再重新执行 start-dfs.sh 命令 ,重新启动

会发现报如下错误

  1. [root@bigdata1 module]# start-dfs.sh
  2. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
  3. Starting namenodes on [bigdata1]
  4. 上一次登录:一 3月 28 20:08:26 CST 2022从 192.168.12.1pts/0 上
  5. bigdata1: ERROR: JAVA_HOME is not set and could not be found.
  6. Starting datanodes
  7. 上一次登录:一 3月 28 23:56:48 CST 2022pts/0 上
  8. bigdata1: ERROR: JAVA_HOME is not set and could not be found.
  9. bigdata2: ERROR: JAVA_HOME is not set and could not be found.
  10. bigdata3: ERROR: JAVA_HOME is not set and could not be found.
  11. Starting secondary namenodes [bigdata3]
  12. 上一次登录:一 3月 28 23:56:48 CST 2022pts/0 上
  13. bigdata3: ERROR: JAVA_HOME is not set and could not be found.

 此时可以在 hadoop-env.sh 配置文件中设置 hadoop和java 的环境变量,添加如下配置:

  1. # 在该 HADOOP_HOME/etc/hadoop/hadoop-evn.sh 下修改添加如下配置
  2. export JAVA_HOME=/usr/java/jdk1.8.0_181
  3. export HADOOP_HOME=/opt/module/hadoop

如下图所示:

注意:这个配置文件要分发到所有机器上

再重新执行 start-dfs.sh 命令 ,重新启动

此时发现启动成功,如下图所示:

并通过 web 页面查看HDFS,如下图所示:

http://bigdata1:9870/explorer.html#/

6.3. 启动YARN

需要在配置了ResourceManager的节点(bigdata2)上执行 start-yarn.sh 命令来启动yarn

同理,会报如下错误

  1. [root@bigdata2 ~]# start-yarn.sh
  2. Starting resourcemanager
  3. ERROR: Attempting to operate on yarn resourcemanager as root
  4. ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
  5. Starting nodemanagers
  6. ERROR: Attempting to operate on yarn nodemanager as root
  7. ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

如下图所示:

 此时需要在 start-yarn.sh,stop-yarn.sh(在hadoop安装目录的sbin里)两个文件顶部添加以下参数

  1. YARN_RESOURCEMANAGER_USER=root
  2. HADOOP_SECURE_DN_USER=yarn
  3. YARN_NODEMANAGER_USER=root

如下图:

再重新执行 start-yarn.sh 命令 ,重新启动

此时发现启动成功,如下图所示(注意:需要配置bigdata2到其他机器的免密登录):

并通过web页面查看yarn,如下图所示:

http://bigdata2:8088/cluster

6.4. 修复启动HDFS和YARN报错问题

当启动HDFS和YARN报错时,除了上述修复方式,还可以在 /opt/module/hadoop/etc/hadoop/hadoop-env.sh 文件中添加如下配置:

  1. export HDFS_NAMENODE_USER=root
  2. export HDFS_DATANODE_USER=root
  3. export HDFS_SECONDARYNAMENODE_USER=root
  4. export YARN_RESOURCEMANAGER_USER=root
  5. export YARN_NODEMANAGER_USER=root

这个配置主要给什么用户授予什么权限,这是给root用户授予对应权限,这样就能正常启动了 

7. 配置集群一键脚本

在 /root/bin 目录下,编辑 hadoop.sh 文件(vim /root/bin/hadoop.sh ),添加如下内容:

  1. #!/bin/bash
  2. if [ $# -lt 1 ]
  3. then
  4. echo "No Args Input..."
  5. exit ;
  6. fi
  7. case $1 in
  8. "start")
  9. echo " =================== 启动 hadoop集群 ==================="
  10. echo " --------------- 启动 hdfs ---------------"
  11. ssh bigdata1 "/opt/module/hadoop/sbin/start-dfs.sh"
  12. echo " --------------- 启动 yarn ---------------"
  13. ssh bigdata2 "/opt/module/hadoop/sbin/start-yarn.sh"
  14. echo " --------------- 启动 historyserver ---------------"
  15. ssh bigdata1 "/opt/module/hadoop/bin/mapred --daemon start historyserver"
  16. ;;
  17. "stop")
  18. echo " =================== 关闭 hadoop集群 ==================="
  19. echo " --------------- 关闭 historyserver ---------------"
  20. ssh bigdata1 "/opt/module/hadoop/bin/mapred --daemon stop historyserver"
  21. echo " --------------- 关闭 yarn ---------------"
  22. ssh bigdata2 "/opt/module/hadoop/sbin/stop-yarn.sh"
  23. echo " --------------- 关闭 hdfs ---------------"
  24. ssh bigdata1 "/opt/module/hadoop/sbin/stop-dfs.sh"
  25. ;;
  26. *)
  27. echo "Input Args Error..."
  28. ;;
  29. esac

修改脚本执行权限:

chmod 777 /root/bin/hadoop.sh

停止和启动集群如下图所示:


注:其他相关文章链接由此进 -> 大数据基础知识点 文章汇总


声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/872743
推荐阅读
相关标签
  

闽ICP备14008679号