赞
踩
脚本内容
- #! /bin/bash
-
- for i in hadoop102 hadoop103 hadoop104
- do
- echo --------- $i ----------
- ssh $i "$*"
- done
前面的笔记有hadoop运行环境搭建(四)_丝丝呀的博客-CSDN博客
[zhang@hadoop102 hadoop]$ vim core-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
- <configuration>
- <!-- 指定NameNode的地址 -->
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://hadoop102:8020</value>
- </property>
- <!-- 指定hadoop数据的存储目录 -->
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/opt/module/hadoop-3.1.3/data</value>
- </property>
-
- <!-- 配置HDFS网页登录使用的静态用户为zhang -->
- <property>
- <name>hadoop.http.staticuser.user</name>
- <value>zhang</value>
- </property>
-
- <!-- 配置该zhang(superUser)允许通过代理访问的主机节点 -->
- <property>
- <name>hadoop.proxyuser.zhang.hosts</name>
- <value>*</value>
- </property>
- <!-- 配置该zhang(superUser)允许通过代理用户所属组 -->
- <property>
- <name>hadoop.proxyuser.zhang.groups</name>
- <value>*</value>
- </property>
- <!-- 配置该zhang(superUser)允许通过代理的用户-->
- <property>
- <name>hadoop.proxyuser.zhang.users</name>
- <value>*</value>
- </property>
- </configuration>
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
[zhang@hadoop102 hadoop]$ vim hdfs-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
- <configuration>
- <!-- nn web端访问地址-->
- <property>
- <name>dfs.namenode.http-address</name>
- <value>hadoop102:9870</value>
- </property>
-
- <!-- 2nn web端访问地址-->
- <property>
- <name>dfs.namenode.secondary.http-address</name>
- <value>hadoop104:9868</value>
- </property>
-
- <!-- 测试环境指定HDFS副本的数量1 -->
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
-
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
[zhang@hadoop102 hadoop]$ vim yarn-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
- <configuration>
- <!-- 指定MR走shuffle -->
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
-
- <!-- 指定ResourceManager的地址-->
- <property>
- <name>yarn.resourcemanager.hostname</name>
- <value>hadoop103</value>
- </property>
-
- <!-- 环境变量的继承 -->
- <property>
- <name>yarn.nodemanager.env-whitelist</name>
- <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
- </property>
-
- <!-- yarn容器允许分配的最大最小内存 -->
- <property>
- <name>yarn.scheduler.minimum-allocation-mb</name>
- <value>512</value>
- </property>
- <property>
- <name>yarn.scheduler.maximum-allocation-mb</name>
- <value>4096</value>
- </property>
-
- <!-- yarn容器允许管理的物理内存大小 -->
- <property>
- <name>yarn.nodemanager.resource.memory-mb</name>
- <value>4096</value>
- </property>
-
- <!-- 关闭yarn对虚拟内存的限制检查 -->
- <property>
- <name>yarn.nodemanager.vmem-check-enabled</name>
- <value>false</value>
- </property>
- </configuration>
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
[zhang@hadoop102 hadoop]$ vim mapred-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
- <configuration>
- <!-- 指定MapReduce程序运行在Yarn上 -->
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- </configuration>
[zhang@hadoop102 hadoop]$ vim /opt/module/hadoop-3.1.3/etc/hadoop/workers
- hadoop102
- hadoop103
- hadoop104
(配置workers不能有空行,不能有空格)
配置历史服务器
[zhang@hadoop102 hadoop]$vi mapred-site.xml
- <!-- 历史服务器端地址 -->
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>hadoop102:10020</value>
- </property>
-
- <!-- 历史服务器web端地址 -->
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>hadoop102:19888</value>
- </property>
配置日志聚集
[zhang@hadoop102 hadoop]$ vim yarn-site.xml
- <!-- 开启日志聚集功能 -->
- <property>
- <name>yarn.log-aggregation-enable</name>
- <value>true</value>
- </property>
-
- <!-- 设置日志聚集服务器地址 -->
- <property>
- <name>yarn.log.server.url</name>
- <value>http://hadoop102:19888/jobhistory/logs</value>
- </property>
-
- <!-- 设置日志保留时间为7天 -->
- <property>
- <name>yarn.log-aggregation.retain-seconds</name>
- <value>604800</value>
- </property>
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
启动集群(第一次要进行格式化,之后千万不要格式化)
[zhang@hadoop102 hadoop-3.1.3]$ bin/hdfs namenode -format
启动HDFS
[zhang@hadoop102 hadoop-3.1.3]$ sbin/start-dfs.sh
[zhang@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh
Web端查看HDFS的Web页面:http://hadoop102:9870/
Hadoop群起脚本
[zhang@hadoop102 bin]$ vim myhadoop.sh
- #!/bin/bash
- if [ $# -lt 1 ]
- then
- echo "No Args Input..."
- exit ;
- fi
- case $1 in
- "start")
- echo " =================== 启动 hadoop集群 ==================="
-
- echo " --------------- 启动 hdfs ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
- echo " --------------- 启动 yarn ---------------"
- ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
- echo " --------------- 启动 historyserver ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
- ;;
- "stop")
- echo " =================== 关闭 hadoop集群 ==================="
-
- echo " --------------- 关闭 historyserver ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
- echo " --------------- 关闭 yarn ---------------"
- ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
- echo " --------------- 关闭 hdfs ---------------"
- ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
- ;;
- *)
- echo "Input Args Error..."
- ;;
- esac
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
修改脚本执行权限
[zhang@hadoop102 bin]$ chmod 777 myhadoop.sh
测试一下
core-site.xml增加配置支持LZO压缩
- <property>
- <name>io.compression.codecs</name>
- <value>
- org.apache.hadoop.io.compress.GzipCodec,
- org.apache.hadoop.io.compress.DefaultCodec,
- org.apache.hadoop.io.compress.BZip2Codec,
- org.apache.hadoop.io.compress.SnappyCodec,
- com.hadoop.compression.lzo.LzoCodec,
- com.hadoop.compression.lzo.LzopCodec
- </value>
- </property>
-
- <property>
- <name>io.compression.codec.lzo.class</name>
- <value>com.hadoop.compression.lzo.LzoCodec</value>
- </property>
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
重新启动脚本才能生效
测试一下好不好用
[zhang@hadoop102 hadoop-3.1.3]$ hadoop fs -mkdir /input
[zhang@hadoop102 hadoop-3.1.3]$ hadoop fs -put README.txt /input
测试压缩
[zhang@hadoop102 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec /input /output
上传一个大的文件
执行wordcount程序
[zhang@hadoop102 software]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output1
一个大文件也没有被进行切片
对上传的LZO文件建索引
[zhang@hadoop102 software]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
执行结束,发现创造了一个索引
再次执行WordCount程序
[zhang@hadoop102 software]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output2
发现进行了切片
HDFS的读写性能主要受网络和磁盘影响比较大。为了方便测试,将hadoop102、hadoop103、hadoop104虚拟机网络都设置为100mbps。
测试网速:
(1)来到hadoop102的/opt/module目录,创建一个
[zhang@hadoop102 software]$ python -m SimpleHTTPServer
(2)在Web页面上访问
hadoop102:8000(web端我只能使用IP打开,不知道哪里出现了问题)
[zhang@hadoop102 software]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 128MB
只有一个副本,所以速度就是硬盘速度(由于副本1就在本地,所以该副本不参与测试)
[zhang@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 128MB
由于目前只有三台服务器,且有三个副本,数据读取就近原则,相当于都是读取的本地磁盘数据,没有走网络。
(1)使用RandomWriter来产生随机数,每个节点运行10个Map任务,每个Map产生大约1G大小的二进制随机数
[atguigu@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar randomwriter random-data
(2)执行Sort程序
[atguigu@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar sort random-data sorted-data
(3)验证数据是否真正排好序了
[atguigu@hadoop102 mapreduce]$
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar testmapredsort -sortInput random-data -sortOutput sorted-data
由于虚拟机设置内存太小,性能不够,不能运行,就不运行了,步骤见上。
HDFS参数调优hdfs-site.xml
dfs.namenode.handler.count=20×logeCluster Size ,比如集群规模为8台时,此参数设置为41。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。