赞
踩
本篇文章针对于2020秋季学期的复习操作,一是对该学期的巩固,二是让老师知道他的努力没有白费,同时,在此感谢徐老师对我们的精心教导…
任务1:安装伪分布式升级版
HDFS→ NameNode,DataNode,SecondaryNameNode
Yarn→ ResourceManager,NodeManager
任务2:搭建完全分布式版的Hadoop集群(重点★)
→简单版的完全分布式
打开虚拟机并用finalShell连接
启动伪分布式集群并查看进程等相关情况
[root@Mymaster logs]# cd /tmp/hadoop-root/dfs/data/current/ [root@Mymaster current]# ll 总用量 4 drwx------ 4 root root 54 1月 14 11:34 BP-2100367676-192.168.8.201-1610540476704 -rw-r--r-- 1 root root 229 1月 14 11:34 VERSION [root@Mymaster current]# cd BP-2100367676-192.168.8.201-1610540476704/ [root@Mymaster BP-2100367676-192.168.8.201-1610540476704]# ll 总用量 4 drwxr-xr-x 4 root root 64 1月 13 20:52 current -rw-r--r-- 1 root root 166 1月 13 20:24 scanner.cursor drwxr-xr-x 2 root root 6 1月 14 11:34 tmp [root@Mymaster BP-2100367676-192.168.8.201-1610540476704]# cd current/ [root@Mymaster current]# ll 总用量 8 -rw-r--r-- 1 root root 19 1月 13 20:52 dfsUsed drwxr-xr-x 3 root root 21 1月 13 20:39 finalized drwxr-xr-x 2 root root 6 1月 13 20:48 rbw -rw-r--r-- 1 root root 132 1月 14 11:34 VERSION [root@Mymaster current]# cd finalized/subdir0/subdir0/ [root@Mymaster subdir0]# ll 总用量 16 -rw-r--r-- 1 root root 31 1月 13 20:39 blk_1073741825 -rw-r--r-- 1 root root 11 1月 13 20:39 blk_1073741825_1001.meta -rw-r--r-- 1 root root 43 1月 13 20:48 blk_1073741826 -rw-r--r-- 1 root root 11 1月 13 20:48 blk_1073741826_1002.meta [root@Mymaster subdir0]# cat blk_1073741825 dfsa ddasf dfsf dfsaa sdfs aaa [root@Mymaster subdir0]# hdfs dfs -cat /input/* dfsa ddasf dfsf dfsaa sdfs aaa [root@Mymaster subdir0]# pwd /tmp/hadoop-root/dfs/data/current/BP-2100367676-192.168.8.201-1610540476704/current/finalized/subdir0/subdir0 [root@Mymaster subdir0]#
建议:
删除之前安装的hadoop伪分布式普通版
前提:
①JDK
②上传,解压,重命名
正式实施:
①修改配置文件
hdfs → core-site.xml, hdfs-site.xml
yarn → yarn-site.xml, mapred-site.xml
shell → hadoop-env.sh,yarn-env.sh
②格式化NameNode
hdfs dfs -format
③启动hadoop伪分布式集群
start-dfs.sh → HDFS
( NameNode,DataNode,SecondaryNameNode )
start-yarn.sh → Yarn
( ResourceManager,NodeManager)
上述两个shell脚本作用等价于:start-all.sh
④确认进程数:
>jps
NameNode
DataNode
SecondaryNameNode
ResourceManager
NodeManager
验证:
hdfs→
①目录的创建
②上传
③下载
监控:http://mymaster:50070
yarn(wordcount案例,源和目的地都在hdfs之上) →
yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-example-2.7.6.jar wordcount /intput /output
监控:http://mymaster:8080
[root@Mymaster subdir0]# jps 19441 DataNode 19221 NameNode 19701 SecondaryNameNode 43831 Jps [root@Mymaster subdir0]# stop-dfs.sh Stopping namenodes on [Mymaster] Mymaster: stopping namenode localhost: stopping datanode Stopping secondary namenodes [0.0.0.0] 0.0.0.0: stopping secondarynamenode [root@Mymaster subdir0]# cd [root@Mymaster ~]# cd /opt/ [root@Mymaster opt]# rm -rf hadoop/ [root@Mymaster opt]# ll 总用量 0 drwxr-xr-x 8 10 143 255 9月 23 2016 jdk drwxr-xr-x 2 root root 67 1月 13 16:59 soft [root@Mymaster opt]#
[root@Mymaster opt]# cd soft/
[root@Mymaster soft]# tar -zxvf hadoop-2.7.6.tar.gz -C ../
...
[root@Mymaster soft]# cd ..
[root@Mymaster opt]# ll
总用量 0
drwxr-xr-x 9 20415 101 149 4月 18 2018 hadoop-2.7.6
drwxr-xr-x 8 10 143 255 9月 23 2016 jdk
drwxr-xr-x 2 root root 67 1月 13 16:59 soft
[root@Mymaster opt]# mv hadoop-2.7.6/ hadoop
[root@Mymaster opt]# ll
总用量 0
drwxr-xr-x 9 20415 101 149 4月 18 2018 hadoop
drwxr-xr-x 8 10 143 255 9月 23 2016 jdk
drwxr-xr-x 2 root root 67 1月 13 16:59 soft
我的设计:
[root@Mymaster opt]# mkdir -p /opt/hadoop-repo/name
[root@Mymaster opt]# mkdir -p /opt/hadoop-repo/secondary
[root@Mymaster opt]# mkdir -p /opt/hadoop-repo/data
[root@Mymaster opt]# mkdir -p /opt/hadoop-repo/tmp
插播小插曲之notepad++
安装(略)
设置成中文(建议英文)
接下来是下载一个好用的插件→NppFTP
使用NppFTP连接我们的Mymaster远程虚拟机
输入密码→是
回到正题
hadoop-env.sh、yarn-env.sh
1)、配置hadoop-env.sh
(脚本文件:/etc/profile.d/bigdata.sh中已经配置了名为JAVA_HOME的环境变量,此处不用配置了。)
export JAVA_HOME=/opt/jdk
2)、配置yarn-env.sh
(若是已经配置好了JAVA_HOME环境变量,该shell脚本不用修改)
export JAVA_HOME=/opt/jdk
core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定HDFS老大(namenode)的通信地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://Mymaster:8020</value> </property> <!-- 指定hadoop运行时产生文件的存储路径 --> <property> <name>hadoop.tmp.dir</name> <value>file:///opt/hadoop-repo/tmp</value> </property> </configuration>
修改完记得ctrl+s保存!!!
同样注意英文和中文空格的问题!!!
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///opt/hadoop-repo/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///opt/hadoop-repo/data</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///opt/hadoop-repo/secondary</value> </property> <!-- secondaryName http地址 --> <property> <name>dfs.namenode.secondary.http-address</name> <value>Mymaster:9001</value> </property> <!-- 数据备份数量--> <property> <name>dfs.replication</name> <value>1</value> </property> <!-- 运行通过web访问hdfs, 50070--> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <!-- 剔除权限控制 --> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <!-- reducer取数据的方式是mapreduce_shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN的老大(ResourceManager)的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>Mymaster</value> </property> <!-- 常识:同一个进程,协议不同,端口号也不同。如:NameNode进程,hdfs协议:8020, htt协议:50070--> <property> <name>yarn.resourcemanager.address</name> <value>Mymaster:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Mymaster:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Mymaster:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Mymaster:8033</value> </property> <!-- http://master:8088 --> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Mymaster:8088</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> </configuration>
mapred-site.xml
哎,好像没有mapred-site.xml!!!
这里需要修改一下文件名,并且必须修改!!
选中右键
去点后缀.template
→ok
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 通知框架MR使用YARN ,也就是“指定mr运行在yarn上”--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 历史job的访问地址--> <property> <name>mapreduce.jobhistory.address</name> <value>Mymaster:10020</value> </property> <!-- 历史job的访问web地址--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>Mymaster:19888</value> </property> <!-- 控制日志信息输出的级别 --> <property> <name>mapreduce.map.log.level</name> <value>INFO</value> </property> <property> <name>mapreduce.reduce.log.level</name> <value>INFO</value> </property> </configuration>
完毕!!!
[root@Mymaster ~]# hadoop namenode -format
这里我们可以看一下NN进程进行格式化后元数据文件的内幕
我们上面设计的配置文件NN元数据储存在/opt/hadoop-repo/name/下
我们的hadoop-repo/name/创建的时候是没有加东西的那么这个就是NN格式化自动创建的!!
进去看看有什么
注意:出现上述乱码,需要关闭当前连接窗口;新开窗口重连master。
NN格式化内幕就说到这里了
为了防止出错我们先分开启动
start-dfs.sh
这里如果某一个进程没有起来可以单开试试,不行的话就得检测你的配置等等了,还有就是玩坏了想重新来的要先删除NN的元数据啦!
hadoop-daemon.sh start namenode
hadoop-daemon.sh start DataNode
hadoop-daemon.sh start SecondaryNameNode
[root@Mymaster ~]# start-yarn.sh
yarn-daemon.sh start ResourceManager
yarn-daemon.sh start SecondaryNameNode
所有一起的命令
[root@Mymaster ~]# stop-all.sh
[root@Mymaster ~]# start-all.sh
没有问题!!!
第1步:验证hdfs的可用性
http://mymaster:50070
上传查看没有问题!!
下载也没问题!!!
第2步:验证yarn的可用性
通过browser能够访问ResourceManager进程
http://mymaster:8088/
将应用提交到远程hadoop伪分布式集群中去运行
[root@Mymaster input]# yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount hdfs://Mymaster:8020/input hdfs://Mymaster:8020/output
没有问题!!!
刷新ResourceManager进程browser
我们再运行一次看一下运行中的界面
到了这里一个升级版的hadoop伪分布式集群就完成了
那么我们说过了,这个娇妻不好哄,不要一激动就直接关机了,必须先关集群,必须先关集群,必须先关集群!!!
那么,本次复习就到此结束了!!!
编写于2021-1-14
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。