赞
踩
目录
2.1、将下载的安装包上传到/opt/software/目录下,并解压安装文件到/opt/module下面
1、使用之前配置的xsync脚本将hadoop目录同步到hadoop21和hadoop22服务器
3.1、配置core-site.xml、hadoop-env.sh以及yarn-env.sh
3、关闭NodeManager 、ResourceManager和HistoryServer
4、启动NodeManager 、ResourceManage和HistoryServer
1、准备3台虚拟机,静态IP地址,关闭防火墙,设置主机名
IP地址,主机名如下:
192.168.56.20,hadoop20
192.168.56.21,hadoop21
192.168.56.22,hadoop22
注意:记得关闭防火墙
修改主机名命令:hostnamectl set-hostname xxx
关闭防火墙命令:systemctl stop firewalld.service
查看防火墙命令:firewall-cmd --state2、3台虚机的/etc/hosts全部配置IP对应主机名
[kgf@hadoop20 ~]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.56.20 hadoop20 192.168.56.21 hadoop21 192.168.56.22 hadoop22 [kgf@hadoop20 ~]$3、OK虚拟机准备完毕
参考之前的hadoop2.X内容即可
1、安装JDK(跳过,参考之前的即可)
2、安装hadoop
2.1、将下载的安装包上传到/opt/software/目录下,并解压安装文件到/opt/module下面
[kgf@hadoop20 software]$ ll total 330156 -rw-rw-r--. 1 kgf kgf 338075860 Jun 14 12:18 hadoop-3.1.3.tar.gz [kgf@hadoop20 software]$ tar -zxvf hadoop-3.1.3.tar.gz -C ../module/ [kgf@hadoop20 module]$ ll total 0 drwxr-xr-x. 9 kgf kgf 149 Sep 12 2019 hadoop-3.1.3 drwxr-xr-x. 7 kgf kgf 245 Oct 6 2018 jdk1.8.0_191 [kgf@hadoop20 module]$ pwd /opt/module [kgf@hadoop20 module]$2.2、将Hadoop添加到环境变量
sudo vim /etc/profile
##HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin2.3、 让修改后的文件生效
source /etc/profile
2.4、测试是否安装成功
[kgf@hadoop20 module]$ hadoop version Hadoop 3.1.3 Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r ba631c436b806728f8ec2f54ab1e289526c90579 Compiled by ztang on 2019-09-12T02:47Z Compiled with protoc 2.5.0 From source with checksum ec785077c385118ac91aadde5ec9799 This command was run using /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-common-3.1.3.jar [kgf@hadoop20 module]$
1、使用之前配置的xsync脚本将hadoop目录同步到hadoop21和hadoop22服务器
[kgf@hadoop20 module]$ pwd /opt/module [kgf@hadoop20 module]$ xsync hadoop-3.1.3/2、配置hadoop的环境变量
1、集群部署规划
注意:
- NameNode和SecondaryNameNode不要安装在同一台服务器
- ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上
2、配置文件说明
Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值
2.1、默认配置文件
2.2、自定义配置文件
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置
3、配置集群
3.1、配置core-site.xml、hadoop-env.sh以及yarn-env.sh
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop20:8020</value> </property> <!-- 指定hadoop数据的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <!-- 配置HDFS网页登录使用的静态用户为atguigu --> <property> <name>hadoop.http.staticuser.user</name> <value>kgf</value> </property> </configuration>
在上面hadoop-env.sh中配置jdk路径
在上面yarn-env.sh中配置jdk路径
3.2、配置hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- nn web端访问地址--> <property> <name>dfs.namenode.http-address</name> <value>hadoop20:9870</value> </property> <!-- 2nn web端访问地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop22:9868</value> </property> </configuration>3.3、配置yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- 指定MR走shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定ResourceManager的地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop21</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>3.4、配置mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定MapReduce程序运行在Yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>3.5、在集群上分发配置好的Hadoop配置文件
[kgf@hadoop20 hadoop]$ xsync /opt/module/hadoop-3.1.3/etc/hadoop/
3.6、去103和104上查看文件分发情况
[kgf@hadoop21 hadoop-3.1.3]$ pwd /opt/module/hadoop-3.1.3 [kgf@hadoop21 hadoop-3.1.3]$ cat etc/hadoop/core-site.xml [kgf@hadoop22 hadoop-3.1.3]$ pwd /opt/module/hadoop-3.1.3 [kgf@hadoop22 hadoop-3.1.3]$ cat etc/hadoop/core-site.xml
1、配置workers(3台服务器都配置)
[kgf@hadoop20 hadoop-3.1.3]$ vim etc/hadoop/workers [kgf@hadoop20 hadoop-3.1.3]$ cat etc/hadoop/workers hadoop20 hadoop21 hadoop22 [kgf@hadoop20 hadoop-3.1.3]$注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。
2、启动集群
如果集群是第一次启动,需要在hadoop102节点格式化NameNode(注意:格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。如果集群在运行过程中报错,需要重新格式化NameNode的话,一定要先停止namenode和datanode进程,并且要删除所有机器的data和logs目录,然后再进行格式化。)
[kgf@hadoop20 hadoop-3.1.3]$ hdfs namenode -format
启动HDFS:
[kgf@hadoop20 hadoop-3.1.3]$ sbin/start-dfs.sh Starting namenodes on [hadoop20] Starting datanodes hadoop21: WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating. hadoop22: WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating. Starting secondary namenodes [hadoop22] [kgf@hadoop20 hadoop-3.1.3]$ jps 3009 DataNode 2854 NameNode 3225 Jps [kgf@hadoop20 hadoop-3.1.3]$
[kgf@hadoop21 hadoop-3.1.3]$ jps 1872 Jps 1804 DataNode [kgf@hadoop21 hadoop-3.1.3]$
[kgf@hadoop22 module]$ jps 1922 SecondaryNameNode 2004 Jps 1850 DataNode [kgf@hadoop22 module]$在配置了ResourceManager的节点(hadoop21)启动YARN
[kgf@hadoop21 hadoop-3.1.3]$ sbin/start-yarn.sh Starting resourcemanager Starting nodemanagers [kgf@hadoop21 hadoop-3.1.3]$ jps 2072 ResourceManager 2472 Jps 2203 NodeManager 1804 DataNode [kgf@hadoop21 hadoop-3.1.3]$
[kgf@hadoop22 module]$ jps 1922 SecondaryNameNode 2211 Jps 2104 NodeManager 1850 DataNode [kgf@hadoop22 module]$
[kgf@hadoop20 hadoop-3.1.3]$ jps 3009 DataNode 2854 NameNode 3303 NodeManager 3402 Jps [kgf@hadoop20 hadoop-3.1.3]$3、Web端查看HDFS的NameNode
(a)浏览器中输入:http://hadoop20:9870/
(b)查看HDFS上存储的数据信息
4、Web端查看YARN的ResourceManager
(a)浏览器中输入:http://hadoop21:8088/cluster
(b)查看YARN上运行的Job信息
1、上传文件到集群
1.1、上传小文件
[kgf@hadoop20 hadoop-3.1.3]$ hadoop fs -mkdir /input [kgf@hadoop20 hadoop-3.1.3]$ ll wcinput/ total 4 -rw-rw-r--. 1 kgf kgf 60 Jun 14 13:20 hello.txt [kgf@hadoop20 hadoop-3.1.3]$ pwd /opt/module/hadoop-3.1.3 [kgf@hadoop20 hadoop-3.1.3]$ [kgf@hadoop20 hadoop-3.1.3]$ hadoop fs -put wcinput/hello.txt /input 2022-06-14 13:21:17,764 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false [kgf@hadoop20 hadoop-3.1.3]$1.2、上传大文件
[kgf@hadoop20 hadoop-3.1.3]$ hadoop fs -put /opt/software/hadoop-3.1.3.tar.gz / 2022-06-14 13:24:12,505 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2022-06-14 13:24:14,059 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2022-06-14 13:24:16,001 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false [kgf@hadoop20 hadoop-3.1.3]$
为了查看程序的历史运行情况,需要配置一下历史服务器。具体配置步骤如下:
1、配置mapred-site.xml
在该文件里面增加如下配置。
<!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop20:10020</value> </property> <!-- 历史服务器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop21:19888</value> </property>2、分发配置
[kgf@hadoop20 hadoop-3.1.3]$ xsync etc/hadoop/mapred-site.xml fname=mapred-site.xml pdir=/opt/module/hadoop-3.1.3/etc/hadoop ----------hadoop21-------- sending incremental file list mapred-site.xml sent 579 bytes received 43 bytes 414.67 bytes/sec total size is 1194 speedup is 1.92 ------hadoop22-------- sending incremental file list mapred-site.xml sent 579 bytes received 43 bytes 1244.00 bytes/sec total size is 1194 speedup is 1.92 [kgf@hadoop20 hadoop-3.1.3]$3、在hadoop102启动历史服务器
[kgf@hadoop20 hadoop-3.1.3]$ mapred --daemon start historyserver [kgf@hadoop20 hadoop-3.1.3]$ jps 3009 DataNode 2854 NameNode 3734 JobHistoryServer 3303 NodeManager 3759 Jps [kgf@hadoop20 hadoop-3.1.3]$ pwd /opt/module/hadoop-3.1.3 [kgf@hadoop20 hadoop-3.1.3]$4、查看JobHistory
日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryServer。
开启日志聚集功能具体步骤如下:
1、配置yarn-site.xml
在该文件里面增加如下配置
<!-- 开启日志聚集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 设置日志聚集服务器地址 --> <property> <name>yarn.log.server.url</name> <value>http://hadoop20:19888/jobhistory/logs</value> </property> <!-- 设置日志保留时间为7天 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>2、分发配置
[kgf@hadoop20 hadoop-3.1.3]$ xsync etc/hadoop/yarn-site.xml fname=yarn-site.xml pdir=/opt/module/hadoop-3.1.3/etc/hadoop ----------hadoop21-------- sending incremental file list yarn-site.xml sent 1028 bytes received 43 bytes 2142.00 bytes/sec total size is 1645 speedup is 1.54 ------hadoop22-------- sending incremental file list yarn-site.xml sent 1028 bytes received 43 bytes 2142.00 bytes/sec total size is 1645 speedup is 1.54 [kgf@hadoop20 hadoop-3.1.3]$3、关闭NodeManager 、ResourceManager和HistoryServer
[kgf@hadoop21 hadoop-3.1.3]$ jps 3304 ResourceManager 3131 DataNode 3436 NodeManager 3726 Jps [kgf@hadoop21 hadoop-3.1.3]$ sbin/stop-yarn.sh Stopping nodemanagers Stopping resourcemanager [kgf@hadoop21 hadoop-3.1.3]$ jps 3131 DataNode 4093 Jps [kgf@hadoop21 hadoop-3.1.3]$
[kgf@hadoop20 hadoop-3.1.3]$ mapred --daemon stop historyserver [kgf@hadoop20 hadoop-3.1.3]$ jps 5317 Jps 4662 DataNode 4505 NameNode [kgf@hadoop20 hadoop-3.1.3]$4、启动NodeManager 、ResourceManage和HistoryServer
[kgf@hadoop21 hadoop-3.1.3]$ sbin/start-yarn.sh Starting resourcemanager Starting nodemanagers [kgf@hadoop21 hadoop-3.1.3]$ jps 4641 Jps 4218 ResourceManager 3131 DataNode 4351 NodeManager [kgf@hadoop21 hadoop-3.1.3]$
[kgf@hadoop20 hadoop-3.1.3]$ mapred --daemon start historyserver [kgf@hadoop20 hadoop-3.1.3]$ jps 4662 DataNode 5542 JobHistoryServer 5383 NodeManager 4505 NameNode 5567 Jps [kgf@hadoop20 hadoop-3.1.3]$5、执行WordCount程序
[kgf@hadoop20 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output 2022-06-14 13:58:29,712 INFO client.RMProxy: Connecting to ResourceManager at hadoop21/192.168.56.21:8032 2022-06-14 13:58:30,343 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/kgf/.staging/job_1655215095011_0001 2022-06-14 13:58:30,423 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2022-06-14 13:58:30,551 INFO input.FileInputFormat: Total input files to process : 1 2022-06-14 13:58:30,574 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2022-06-14 13:58:30,602 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2022-06-14 13:58:30,621 INFO mapreduce.JobSubmitter: number of splits:1 2022-06-14 13:58:30,732 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2022-06-14 13:58:30,763 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1655215095011_0001 2022-06-14 13:58:30,763 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2022-06-14 13:58:30,906 INFO conf.Configuration: resource-types.xml not found 2022-06-14 13:58:30,906 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2022-06-14 13:58:31,091 INFO impl.YarnClientImpl: Submitted application application_1655215095011_0001 2022-06-14 13:58:31,121 INFO mapreduce.Job: The url to track the job: http://hadoop21:8088/proxy/application_1655215095011_0001/ 2022-06-14 13:58:31,121 INFO mapreduce.Job: Running job: job_1655215095011_0001 2022-06-14 13:58:41,290 INFO mapreduce.Job: Job job_1655215095011_0001 running in uber mode : false 2022-06-14 13:58:41,291 INFO mapreduce.Job: map 0% reduce 0% 2022-06-14 13:58:47,396 INFO mapreduce.Job: map 100% reduce 0% 2022-06-14 13:58:52,453 INFO mapreduce.Job: map 100% reduce 100% 2022-06-14 13:58:52,462 INFO mapreduce.Job: Job job_1655215095011_0001 completed successfully 2022-06-14 13:58:52,767 INFO mapreduce.Job: Counters: 536、查看日志
查看任务运行日志:
- #!/bin/bash
-
- if [ $# -lt 1 ]
- then
- echo "No Args Input..."
- exit ;
- fi
-
- case $1 in
- "start")
- echo " =================== 启动 hadoop集群 ==================="
-
- echo " --------------- 启动 hdfs ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
- echo " --------------- 启动 yarn ---------------"
- ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
- echo " --------------- 启动 historyserver ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
- ;;
- "stop")
- echo " =================== 关闭 hadoop集群 ==================="
-
- echo " --------------- 关闭 historyserver ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
- echo " --------------- 关闭 yarn ---------------"
- ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
- echo " --------------- 关闭 hdfs ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
- ;;
- *)
- echo "Input Args Error..."
- ;;
- esac
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。