赞
踩
三种支持的模式Hadoop 集群:
本地(独立)模式
伪分布式模式
全分布式模式
[root@server1 ~]# useradd hadoop 创建hadoop用户
[root@server1 ~]# su - hadoop 用普通用户部署,不用超户
[hadoop@server1 ~]$ lftp 172.25.254.50
lftp 172.25.254.50:/> cd pub/
lftp 172.25.254.50:/pub> ls
-rwxr-xr-x 1 0 0 359196911 May 31 10:35 hadoop-3.2.1.tar.gz
-rwxr-xr-x 1 0 0 185646832 May 31 10:34 jdk-8u181-linux-x64.tar
lftp 172.25.254.50:/pub> get hadoop-3.2.1.tar.gz 下载hadoop安装包
359196911 bytes transferred
lftp 172.25.254.50:/pub> get jdk-8u181-linux-x64.tar.gz 下载hadoop需要安装jdk
185646832 bytes transferred
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz 下载后 解压
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java 为了方便,做一个软连接
[hadoop@server1 ~]$ tar zxf hadoop-3.2.1.tar.gz 解压
[hadoop@server1 ~]$ ln -s hadoop-3.2.1 hadoop 做软连接
[hadoop@server1 ~]$ cd /home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim hadoop-env.sh 设置环境变量
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hadoop 运行
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 调用jar包,jar包里面有很多方法
'dfs[a-z.]+' 过滤input文件中以dfs开头的,输出到output目录
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000 _SUCCESS 结果输出到output目录
[root@server1 ~]# passwd hadoop 给hadoop用户设置密码
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
[hadoop@server1 ~]$ ssh-keygen 配置免密
[hadoop@server1 ~]$ ssh-copy-id server1
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml 在文件最后添加如下参数
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value> hdfs master地址
</property>
</configuration>
[hadoop@server1 hadoop]$ ssh localhost 免密
[hadoop@server1 hadoop]$ vim hdfs-site.xml 在文件最后添加参数
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> 副本数改为1,默认为3
</property>
</configuration>
[hadoop@server1 ~]$ cd /home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format 格式化
[hadoop@server1 hadoop]$ ls /tmp/ 默认数据目录
hadoop hadoop-hadoop hadoop-hadoop-namenode.pid hsperfdata_hadoop
[hadoop@server1 hadoop]$ sbin/start-dfs.sh 启动hdfs相关进程
[hadoop@server1 ~]$ vim .bash_profile 添加java 命令路经
[hadoop@server1 ~]$ source .bash_profile 生效
[hadoop@server1 ~]$ jps 查看java进程
4196 Jps
3957 SecondaryNameNode 注节点出现故障,SecondaryNameNode可以接管
3659 NameNode master进程
3772 DataNode
访问:172.25.50.1:9870
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report 查看分布式文件系统概况
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/ 创建user用户目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop 创建hadoop用户
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input 上传input目录
input目录里的内容都已经上传了
[hadoop@server1 hadoop]$ rm -fr input/ 删除本地input
[hadoop@server1 hadoop]$ rm -fr output/ 删除本地output
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output 统计一下单词词频,此时本地没有input,要在分布式文件系统取
生成了output文件,如下图
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls 查看,有两个文件
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2022-05-31 23:29 input
drwxr-xr-x - hadoop supergroup 0 2022-06-01 01:18 output
[hadoop@server1 hadoop]$ bin/hdfs dfs cat output/* 可以查看output里面所有内容
"*" 21
"AS 9
"License"); 9
"alice,bob 21
"clumping" 1
(ASF) 1
(root 1
(the 9
--> 18
-1 1
-1, 1
0.0 1
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output 也可以将output下载到本地
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ cat * 查看
"*" 21
"AS 9
"License"); 9
"alice,bob 21
"clumping" 1
(ASF) 1
(root 1
(the 9
--> 18
-1 1
需要三台虚拟机,其中server1为master节点,server2、3为worker节点
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh 停掉hdfs进程
需要三台虚拟机配置一致,需要同步,可以通过nfs共享,在所有节点安装nfs套件
[root@server1 ~]# yum install -y nfs-utils -y
[root@server2 ~]# yum install -y nfs-utils -y
[root@server3 ~]# yum install -y nfs-utils -y
[root@server1 ~]# id hadoop 查看hadoop id
uid=1001(hadoop) gid=1001(hadoop) groups=1001(hadoop)
[root@server1 ~]# vim /etc/exports
/home/hadoop *(rw,anonuid=1001,anongid=1001) /home/hadoop 将这个目录共享出去,都是以id为1001的身份去写
[root@server1 ~]# systemctl start nfs 启动nfs
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop * 目录已经共享
[root@server2 ~]# useradd hadoop server2上创建hdaoop用户,注意id必须保持一致要是1001都是10001
[root@server3 ~]# useradd hadoop server3上创建hdaoop用户,注意id必须保持一致要是1001都是10001
[root@server2 ~]# mount 172.25.50.1:/home/hadoop/ /home/hadoop/ 挂载
[root@server3 ~]# mount 172.25.50.1:/home/hadoop/ /home/hadoop/ 挂载
此时server1、server2、server3上的数据就完全一致了
[hadoop@server1 ~]$ ssh server2 免密
[hadoop@server1 ~]$ ssh server3 免密
[hadoop@server1 ~]$ ssh 172.25.50.1 免密
[hadoop@server1 ~]$ rm -fr /tmp/* 将之前默认数据目录里的内容删除
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml
[hadoop@server1 hadoop]$ vim workers 添加server2、server3为worker
server2 写主机名要有解析
server3
[hadoop@server1 hadoop]$ vim hdfs-site.xml
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format 格式化
[hadoop@server1 hadoop]$ sbin/start-dfs.sh 启动
[hadoop@server1 hadoop]$ jps 启动NameNode进程
5657 Jps
5308 NameNode
5533 SecondaryNameNode
[hadoop@server2 ~]$ jps 启动DataNode进程
4336 DataNode
4543 Jps
[hadoop@server3 ~]$ jps 启动DataNode进程
4370 Jps
4307 DataNode
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user 创建用户目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop 创建hadoop用户
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir input 在用户主目录里创建input目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls 查看用户主目录
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2022-06-01 06:42 input 创建成功
[hadoop@server1 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input 上传
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls input 查看input目录,上传xml文件成功
Found 9 items
-rw-r--r-- 2 hadoop supergroup 8260 2022-06-01 06:45 input/capacity-scheduler.xml
-rw-r--r-- 2 hadoop supergroup 886 2022-06-01 06:45 input/core-site.xml
-rw-r--r-- 2 hadoop supergroup 11392 2022-06-01 06:45 input/hadoop-policy.xml
-rw-r--r-- 2 hadoop supergroup 867 2022-06-01 06:45 input/hdfs-site.xml
-rw-r--r-- 2 hadoop supergroup 620 2022-06-01 06:45 input/httpfs-site.xml
-rw-r--r-- 2 hadoop supergroup 3518 2022-06-01 06:45 input/kms-acls.xml
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output 输出词频
再开一台虚拟机server4
[root@server4 ~]# useradd hadoop
[root@server4 ~]# yum install -y nfs-utils
[root@server4 ~]# mount 172.25.50.1:/home/hadoop/ /home/hadoop/ 挂载,使所有数据保持一致
[hadoop@server4 ~]$ cd hadoop/etc/hadoop/
[hadoop@server4 hadoop]$ vim workers
server2
server3
server4
[hadoop@server4 hadoop]$ bin/hdfs --daemon start datanode 启动datanode节点
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim hdfs-site.xml
[hadoop@server1 hadoop]$ vim hosts.exclude
server2 添加server2 ,屏蔽掉server2
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -refreshNodes 刷新节点,读取刚才配置
Refresh nodes successful
[hadoop@server1 hadoop]$ bin/hdfs dfsadmin -report
[hadoop@server1 hadoop]$ bin/hdfs --daemon stop datanode 也可以用此方法进行节点下线
三个副本所处不同位置,如果客户端在集群节点,第1个副本一定处于本机
添加mapred配置
[hadoop@server1 hadoop]$ cd etc/hadoop/ 在文件最后添加如下参数
[hadoop@server1 hadoop]$ vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
添加yarn配置
[hadoop@server1 hadoop]$ vim yarn-site.xml 在文件最后添加如下参数
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ sbin/start-yarn.sh 启动
Starting resourcemanager 资源管理器
Starting nodemanagers 节点管理器
启动yarn管理器后,会在master节点上开启一个“8088”的端口,浏览器访问界面如下。
停掉相关进程
[hadoop@server1 hadoop]$ sbin/stop-yarn.sh
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh
[hadoop@server1 hadoop]$ rm -fr /tmp/* 清除tmp里面的信息
在开一台虚拟机server5,由于内存有限,server3、4、5都设置为1G
server1、5做高可用、server2、3、4为zk集群
[root@server5 ~]# useradd hadoop 创建hadoop用户
[root@server5 ~]# yum install nfs-utils -y 安装nfs套件
[root@server5 ~]# mount 172.25.50.1:/home/hadoop/ /home/hadoop/ 挂载
[hadoop@server1 ~]$ lftp 172.25.254.50
lftp 172.25.254.50:/pub> get zookeeper-3.4.9.tar.gz 下载zookeeper安装包
[hadoop@server1 ~]$ tar zxf zookeeper-3.4.9.tar.gz 解压
[hadoop@server2 ~]$ cd zookeeper-3.4.9/
[hadoop@server2 zookeeper-3.4.9]$ cd conf/
[hadoop@server2 conf]$ cp zoo_sample.cfg zoo.cfg 拷贝模板生成主配置文件
[hadoop@server2 conf]$ vim zoo.cfg 编辑主配置文件,在文件最后添加zookeeper集群节点
server.1=172.25.50.2:2888:3888 server.1中的1为主机编号,不是主机名,2888为数据同步,通信端口,3888为选举端口
server.2=172.25.50.3:2888:3888
server.3=172.25.50.4:2888:3888
[hadoop@server2 conf]$ mkdir /tmp/zookeeper 创建zookeeper数据目录
[hadoop@server2 conf]$ echo 1 > /tmp/zookeeper/myid 在数据目录里创建myid,其中myid号必须和配置文件配置的id号保持一致
[hadoop@server3 ~]$ mkdir /tmp/zookeeper
[hadoop@server3 ~]$ echo 2 > /tmp/zookeeper/myid
[hadoop@server4 ~]$ mkdir /tmp/zookeeper
[hadoop@server4 ~]$ echo 3 > /tmp/zookeeper/myid
[hadoop@server2~]$ cd zookeeper-3.4.9/
[hadoop@server2 zookeeper-3.4.9]$ bin/zkServer.sh satrt 启动
[hadoop@server3 ~]$ cd zookeeper-3.4.9/
[hadoop@server3 zookeeper-3.4.9]$ bin/zkServer.sh satrt 启动
[hadoop@server4 ~]$ cd zookeeper-3.4.9/
[hadoop@server4 zookeeper-3.4.9]$ bin/zkServer.sh satrt 启动
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master</value> 此处需要修改,改成master,而不是具体某个master地址
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>172.25.50.2:2181,172.25.50.3:2181,172.25.50.4:2181</value> 添加zk集群连接地址
</property>
</configuration>
[hadoop@server1 hadoop]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim hdfs-site.xml 编辑hdfs配置文件
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value> 将副本数改为3,现在有3个dn
</property>
<property>
<name>dfs.nameservices</name>
<value>masters</value> 指定dfs的nameserver为master和core-site.xml文件中的设置保持一致
</property>
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h1,h2</value> 指定master的两个节点为h1、h2
</property>
<property>
<name>dfs.namenode.rpc-address.masters.h1</name>
<value>172.25.50.1:9000</value> 指定master节点h1的通信地址
</property>
<property>
<name>dfs.namenode.http-address.masters.h1</name>
<value>172.25.50.1:9870</value> 指定master节点h1图形化端口
</property>
<property>
<name>dfs.namenode.rpc-address.masters.h2</name>
<value>172.25.50.5:9000</value> 指定master节点h2的通信地址
</property>
<property>
<name>dfs.namenode.http-address.masters.h2</name>
<value>172.25.50.5:9870</value> 指定master节点h2图形化端口
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.25.50.2:8485;172.25.50.3:8485;172.25.254.4:8485/masters</value> </property> 指定namenode原数据在日志节点存放位置
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/journaldata</value> 指定日志节点本地存储路经
</property>
</configuration>
<property>
<name>dfs.ha.automatic-failover.enabled</name> 开启NN失败自动切换
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 开启自动切换的实现方式
</property>
<property>
<name>dfs.ha.fencing.methods</name> 隔离机制
<value>
sshfence 两种隔离方式,ssh方式、shell方式
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value> 隔离ssh机制需要免密
</property>
在三个DN节点上依次启动 journalnode(第一次启动 hdfs 必须先启动 journalnode)
[hadoop@server2 ~]$ cd hadoop
[hadoop@server2 hadoop]$ bin/hdfs --daemon start journalnode
[hadoop@server3 ~]$ cd hadoop
[hadoop@server3 hadoop]$ bin/hdfs --daemon start journalnode
[hadoop@server4 ~]$ cd hadoop
[hadoop@server4 hadoop]$ bin/hdfs --daemon start journalnode
[hadoop@server1 hadoop]$ cd hadoop
格式化hdfs集群
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
Namenode 数据默认存放在/tmp,需要把数据拷贝到h2(即从server1复制到server2)
[hadoop@server1 hadoop]$ scp -r /tmp/hadoop-hadoop 172.25.50.5:/tmp
格式化 zookeeper (只需在 h1 上执行即可)
[hadoop@server1 hadoop]$ bin/hdfs zkfc -formatZK
启动hdfs集群
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
测试:在浏览器访问master端
可以看到,server1为“active"是主机,server5为“standby”是备用机。
在主机master节点(server1)上传数据
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir input //创建目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input //上传数据
模拟server1节点故障
结束server1中的“NameNode”进程
通过浏览器查看server5已经成为了“activate”端,数据已经从server1切换到了server5上。
server1节点恢复正常
server1节点恢复正常后,仍然只是从节点standby状态,成为了备用机。可以理解为:谁先注册zk,谁就是主master,其他节点就是备用master。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。