赞
踩
前提条件(安装jdk、配置好静态IP、服务器名称hostsname)
安装epel-release 相当于一个软件仓库
yum install -y epel-release
http://archive.apache.org/dist/hadoop/common/
- mkdir /opt/module
- tar -zxvf /opt/software/hadoop-3.3.5.tar.gz -C /opt/module/
- cd /etc/profile.d/
- vim my_env.sh
内容如下:
- #JAVA_HOME
- export JAVA_HOME=/usr/local/java/jdk1.8.0_212
- export PATH=$PATH:$JAVA_HOME/bin
-
- #HADOOP_HOME
- export HADOOP_HOME=/opt/module/hadoop-3.3.5
- export PATH=$PATH:$HADOOP_HOME/bin
- export PATH=$PATH:$HADOOP_HOME/sbin
保存后刷新环境变量
source /etc/profile
hadoop version
- cd /opt/module/hadoop-3.3.5
- mkdir ./zyfinput
- cd zyfinput/
- vim word.txt
内容如下:
- zhangsan zhangsan
- lisi
- wangwu wangwu
- zhaoliu
- xiaoqi
- huba
执行
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordcount zyfinput/ ./zyfoutput
此时会产生zyfoutput文件夹,用来保存执行结果
查看执行结果(每个单词的统计结果)
cat ./zyfoutput/part-r-00000
rsync -av /opt/module/hadoop-3.3.5/ root@hadoop104:/opt/module/hadoop-3.3.5/
在/usr/local/bin/目录下创建xsync.sh(用来将文件目录同步其他机器的脚本)
- cd /usr/local/bin/
- vim xsync.sh
内容如下:
- #!/bin/bash
- #1.判断参数个数
- if [ $# -lt 1 ]
- then
- echo Not Enough Arguement!
- exit;
- fi
-
- #2.遍历集群所有机器
- for host in hadoop102 hadoop103 hadoop104
- do
- echo ====================$host====================
- #3.遍历所有目录,挨个发送
- for file in $@
- do
- #4.判断文件是否存在
- if [ -e $file ]
- then
- #5.获取父目录
- pdir=$(cd -P $(dirname $file); pwd)
-
- #6.获取当前文件名称
- fname=$(basename $file)
- ssh $host "mkdir -p $pdir"
- rsync -av $pdir/$fname $host:$pdir
- else
- echo $file does not exists!
- fi
- done
- done
-
刷新环境变量,使得刚才创建的xsync.sh生效
- source /etc/profile
- chmod 777 xsync.sh
使用/xsync.sh命令,同步配置文件到hadoop103 hadoop104
xsync.sh /etc/profile.d/my_env.sh
在每台机器上刷新环境变量
source /etc/profile
进入home(家)目录
cd ~
查看隐藏文件
ls -al
cd ./.ssh
生成公钥、私钥
ssh-keygen -t rsa
拷贝公钥到其他服务器(hadoop103)
ssh-copy-id hadoop103
此时 hadoop2登录hadoop3就不需要密码了
前提条件(克隆三台【hadoop102/103/104】机器配置好hostname、固定ip、安装好jdk)
hostname、固定ip配置,可参看“centos虚拟机克隆配置.docx”文档
拷贝102的hadoop到103节点、104节点(在hadoop102节点执行)
- scp -r /opt/module/hadoop-3.3.5/ root@hadoop103:/opt/module/
-
- scp -r /opt/module/hadoop-3.3.5/ root@hadoop104:/opt/module/
每台机器检查hadoop 是否安装成功
hadoop version
都在/opt/module/hadoop-3.3.5/etc/hadoop目录下
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
-
- <!-- 指定NameNode的地址 -->
-
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://hadoop102:8020</value>
- </property>
-
- <!-- 指定hadoop数据的存储目录 -->
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/opt/module/hadoop-3.3.5/data</value>
- </property>
-
- <!-- 配置HDFS网页登录使用的静态用户为 root -->
- <property>
- <name>hadoop.http.staticuser.user</name>
- <value>root </value>
- </property>
- </configuration>
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <!-- nn web 端访问地址 -->
- <property>
- <name>dfs.namenode.http-address</name>
- <value>hadoop102:9870</value>
- </property>
- <!-- 2nn web端访问地址 -->
- <property>
- <name>dfs.namenode.secondary.http-address</name>
- <value>hadoop104:9868</value>
- </property>
- </configuration>
2.4、MapReduce配置文件 mapred-site.xml 如下
执行以下两句,输出的内容即为mapreduce.application.classpath的value内容
export HADOOP_CLASSPATH=$(hadoop classpath)
echo $HADOOP_CLASSPATH
-
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <!-- 指定MapReduce 程序运行在Yarn上 -->
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- <property>
- <name>yarn.app.mapreduce.am.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- <property>
- <name>mapreduce.map.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- <property>
- <name>mapreduce.reduce.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- <property>
- <name>mapreduce.application.classpath</name>
- <value>/opt/module/hadoop-3.3.5/etc/hadoop:/opt/module/hadoop-3.3.5/share/hadoop/common/lib/*:/opt/module/hadoop-3.3.5/share/hadoop/common/*:/opt/module/hadoop-3.3.5/share/hadoop/hdfs:/opt/module/hadoop-3.3.5/share/hadoop/hdfs/lib/*:/opt/module/hadoop-3.3.5/share/hadoop/hdfs/*:/opt/module/hadoop-3.3.5/share/hadoop/mapreduce/*:/opt/module/hadoop-3.3.5/share/hadoop/yarn:/opt/module/hadoop-3.3.5/share/hadoop/yarn/lib/*:/opt/module/hadoop-3.3.5/share/hadoop/yarn/*</value>
- </property>
- </configuration>
hadoop102
hadoop103
hadoop104
xsync.sh /opt/module/hadoop-3.3.5/etc/hadoop/
去103和104查看文件分发情况
cat /optmodule/hadoop-3.3.5/etc/hadoop/core-site.xml
hdfs namenode -format (name格式化)
如果集群是第一次启动,需要在hadoop102节点格式化NameNode(注意:格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不大已有的数据。如果集群在运行过程中保持,需要重新格式化NameNode的话,一定要先停止namenode和datanode进程,并且要删除所有机器的data和logs目录,然后再进行格式化)
初始化成功后,会多出一个data和logs文件夹
vim ./sbin/start-dfs.sh
vim ./sbin/stop-dfs.sh
在文件头部添加如下内容(否则启动时会报错)
- HDFS_DATANODE_USER=root
-
- HADOOP_SECURE_DN_USER=hdfs
-
- HDFS_NAMENODE_USER=root
-
- HDFS_SECONDARYNAMENODE_USER=root
-
- YARN_RESOURCEMANAGER_USER=root
-
- YARN_NODEMANAGER_USER=root
4.2、将start-yarn.sh,stop-yarn.sh两个文件顶部添加以下参数
vim ./sbin/start-yarn.sh
vim ./sbin/stop-yarn.sh
在文件头部添加如下内容(否则启动时会报错)
- YARN_RESOURCEMANAGER_USER=root
-
- HADOOP_SECURE_DN_USER=yarn
-
- YARN_NODEMANAGER_USER=root
- xsync.sh /opt/module/hadoop-3.3.5/sbin/start-dfs.sh
-
- xsync.sh /opt/module/hadoop-3.3.5/sbin/stop-dfs.sh
-
- xsync.sh /opt/module/hadoop-3.3.5/sbin/start-yarn.sh
-
- xsync.sh /opt/module/hadoop-3.3.5/sbin/stop-yarn.sh
./sbin/start-dfs.sh
在hadoop102上 执行jps命令
在hadoop103上 执行jps命令
在hadoop104上 执行jps命令
./sbin/start-yarn.sh
jps (hadoop103节点)
jps (hadoop102节点)
jps (hadoop104节点)
启动的节点要与下图对应上
Web端查看HDFS的NameNode
浏览器输入:http://192.168.154.102:9870/
Web端查看YARN的ResourceManager
浏览器输入:http://192.168.154.103:8088
- hadoop fs -mkdir /zyfinput
-
- hadoop fs -put ./zyfinput/word.txt /zyfinput
文件存储在/opt/module/hadoop-3.3.5/data/dfs/data/current/BP-929129054-192.168.154.102-1698241909747/current/finalized/subdir0/subdir0/ blk_1073741825
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordcount /zyfinput /zyfoutput
执行过程中可看到进度
执行完毕后,可查看执行结果
当在使用中发生错误,
3、name格式化:hdfs namenode -format
4、再次启动
在mapred-site.xml文件中追加一下配置信息
vim ./etc/hadoop/mapred-site.xml
- <!-- 历史服务器地址 -->
-
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>hadoop102:10020</value>
- </property>
-
-
- <!-- 历史服务器web端地址 -->
-
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>hadoop102:19888</value>
- </property>
同步到其他节点
xsync.sh ./etc/hadoop/mapred-site.xml
在hadoop103节点上停掉yarn,然后重新启动后,用jps检查是否启动
./sbin/stop-yarn.sh
./sbin/start-yarn.sh
jps
在hadoop102节点上启动历史数据服务
./bin/mapred --daemon start historyserver
jps
多了个jobHistorySever
在yarn-site.xml文件下添加如下配置
vim ./etc/hadoop/yarn-site.xml
同步到其他节点
xsync.sh ./etc/hadoop/yarn-site.xml
在hadoop102节点上停掉历史服务器
mapred --daemon stop historyserver
在hadoop103节点上停掉yarn后重新启动
- ./sbin/stop-yarn.sh
-
- ./sbin/start-yarn.sh
在hadoop102节点上启动历史服务器
mapred --daemon starthistoryserver
测试
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordcount /zyfinput /zyfoutput2
删除一个文件
- hadoop fs -rm /zyfinput/word.txt
-
- hadoop fs -rm / zyfinput /*.txt
#删除目录和子文件
- hadoop fs -rm -r /zyfinput/
-
- hdfs dfs -rm -r /zyfoutput
hdfs --daemon start/stop namenode/datanode/secondarynamenode
yarn --daemon start/stop resourcemanager/nodemanager
./sbin/start-dfs.sh ./sbin/stop-dfs.sh
./sbin/start-yarn.sh ./sbin/stop-yarn.sh
hadoop fs 命令 等同于 hdfs dfs 命令
创建目录sanguo
hadoop fs -mkdir /sanguo
剪切上传(shuguo.txt上传到HDFS的sanguo目录)
hadoop fs -moveFromLocal ./test_sanguo/shuguo.txt /sanguo
拷贝上传(weiguo.txt上传到HDFS的sanguo目录)
- hadoop fs -copyFromLocal ./test_sanguo/weiguo.txt /sanguo
-
- hadoop fs -put ./test_sanguo/wuguo.txt /sanguo
内容追加(将本地的liubei.txt文件的内容追加到HDFS的shuguo.txt)
hdfs dfs -appendToFile ./test_sanguo/liubei.txt /sanguo/shuguo.txt
下载(将HDFS的wuguo.txt 下载到test_sanguo目录下)
hadoop fs -copyToLocal /sanguo/wuguo.txt ./test_sanguo
下载可更改名称(将HDFS的wuguo.txt 下载到test_sanguo目录下,名字为shuguo2.txt)
hadoop fs -get /sanguo/wuguo.txt ./test_sanguo/shuguo2.txt
查看HDFS列表(sanguo目录的列表)
hdfs dfs -ls /sanguo
查看HDFS根列表
hadoop fs -ls /
显示内容(显示HDFS上wuguo.txt的内容)
hadoop fs -cat /sanguo/wuguo.txt
修改权限(修改shuguo.txt未666)
hadoop fs -chmod 666 /sanguo/shuguo.txt
修改文件所属权限(-chgrp -chmod -chown)与linux文件系统操作一样
hadoop fs -chown zyf:zyf /sanguo/shuguo.txt
创建路径 (创建jinguo目录)
hadoop fs -mkdir /jinguo
从HDFS一个路径拷贝到HDFS的另一个路径(从sanguo拷贝到jinguo)
hadoop fs -cp /sanguo/shuguo.txt /jinguo
从一个路径移动到HDFS的另一个路径(从sanguo目录把wuguo.txt移动到jinguo目录)
hadoop fs -mv /sanguo/wuguo.txt /jinguo
显示一个HDFS文件的末尾1kb的数据
hdfs dfs -tail /sanguo/weiguo.txt
删除文件
- hdfs dfs -rm /jingguo/*
-
- hdfs dfs -rm /jingguo/shuguo.txt
删除目录及目录里面的内容(递归删除)
hadoop fs -rm -r /jingguo
统计目录的大小信息
- hdfs dfs -du /sanguo
-
- hdfs dfs -du -h /sanguo
-
- hdfs dfs -du -s -h /sanguo
设置HDFS中文件的副本数量(只有节点数量增加到5时,才可以设置5)
haoop fs -setrep 5 /shanguo/wuguo.txt
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。