当前位置:   article > 正文

hadoop分布式集群的搭建_hadoop 集群搭建 尚硅谷

hadoop 集群搭建 尚硅谷

修改hosts文件

在上章中 CentOS 7已经配置了Java环境,采用搭建elasticsearch集群的三台 Linux CentOS 7机器,搭建三节点 Hadoop分布式集群,其中node01作为Master,node2和node3作为slaves。参考:http://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

NodeNameIP地址
node01192.168.92.90
node02192.168.92.91
node03192.168.92.92

永久设置主机名,需要修改hosts文件,设置虚拟机的ip和主机名的映射关系,并关闭防火墙

  1. [root@node01 ~]# vim /etc/sysconfig/network
  2. #########
  3. HOSTNAME=node01
  4. [root@node01 ~]# vim /etc/hosts
  5. #########
  6. 192.168.92.90 node01
  7. 192.168.92.91 node02
  8. 192.168.92.92 node03
  9. [root@node01 ~]# systemctl stop firewalld
  10. [root@node01 ~]# systemctl disable firewalld.service
  11. [root@node02 ~]# vim /etc/sysconfig/network
  12. #########
  13. HOSTNAME=node02
  14. [root@node02 ~]# vim /etc/hosts
  15. #########
  16. 192.168.92.90 node01
  17. 192.168.92.91 node02
  18. 192.168.92.92 node03
  19. [root@node02 ~]# systemctl stop firewalld
  20. [root@node02 ~]# systemctl disable firewalld.service
  21. [root@node03 ~]# vim /etc/sysconfig/network
  22. #########
  23. HOSTNAME=node03
  24. [root@node03 ~]# vim /etc/hosts
  25. #########
  26. 192.168.92.90 node01
  27. 192.168.92.91 node02
  28. 192.168.92.92 node03
  29. [root@node03 ~]# systemctl stop firewalld
  30. [root@node03 ~]# systemctl disable firewalld.service

配置ssh免密码登录

hadoop通过SSH实现对各节点的管理,因此需要配置ssh免密码登录,,实现node01免密码登录到node02和node03。

  1. [root@node01 ~]# ssh-keygen -t rsa
  2. Generating public/private rsa key pair.
  3. Enter file in which to save the key (/root/.ssh/id_rsa):
  4. Enter passphrase (empty for no passphrase):
  5. Enter same passphrase again:
  6. Your identification has been saved in /root/.ssh/id_rsa.
  7. Your public key has been saved in /root/.ssh/id_rsa.pub.
  8. The key fingerprint is:
  9. SHA256:UGcKoMkBmrZQXNzVTKoyOMFEWXPfY0LmZSZ7xfSwLrI root@node01
  10. The key's randomart image is:
  11. +---[RSA 2048]----+
  12. |.+===oo.*+Bo+ |
  13. |.*o+.o.B %o..+ |
  14. |+.* . B = . . |
  15. |o .o o + o |
  16. | .o o . S . . |
  17. | . o o . |
  18. | E |
  19. | |
  20. | |
  21. +----[SHA256]-----+
  22. [root@node01 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  23. [root@node01 ~]# ssh node01
  24. Last login: Mon Feb 24 12:50:34 2020
  25. [root@node01 ~]# scp ~/.ssh/id_rsa.pub root@node02:~/
  26. root@node02's password:
  27. id_rsa.pub 100% 393 351.3KB/s 00:00
  28. [root@node02 ~]$ mkdir ~/.ssh
  29. [root@node02 ~]$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
  30. [root@node02 ~]# rm -rf ~/id_rsa.pub
  31. [root@node01 ~]# ssh node02
  32. Last login: Mon Feb 24 15:49:50 2020 from node01
  33. [root@node02 ~]#
  34. [root@node01 ~]# scp ~/.ssh/id_rsa.pub root@node03:~/
  35. root@node03's password:
  36. id_rsa.pub 100% 393 351.3KB/s 00:00
  37. [root@node03 ~]# mkdir ~/.ssh
  38. [root@node03 ~]# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
  39. [root@node03 ~]# rm -rf ~/id_rsa.pub
  40. [root@node01 ~]# ssh node03
  41. Last login: Mon Feb 24 15:45:09 2020 from 192.168.92.1
  42. [root@node03 ~]#

下载hadoop

下载hadoop官方二进制的版本,这里下载hadoop3.2.1版本,下载链接:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz,下载完成后,我们选择解压安装至/usr/local/hadoop/ 中

  1. [root@node01 ~]# mkdir -p /root/opt/module/hadoop/
  2. [root@node01 ~]# cd opt/module/hadoop/
  3. [root@node01 hadoop ~]# wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
  4. [root@node01 hadoop ~] # tar -zxvf hadoop-3.2.1.tar.gz
  5. [root@node01 hadoop ~] # cd hadoop-3.2.1
  6. [root@node01 hadoop-3.2.1]# ll
  7. 总用量 180
  8. drwxr-xr-x. 2 hadoop hadoop 203 911 00:51 bin
  9. drwxr-xr-x. 3 hadoop hadoop 20 910 23:58 etc
  10. drwxr-xr-x. 2 hadoop hadoop 106 911 00:51 include
  11. drwxr-xr-x. 3 hadoop hadoop 20 911 00:51 lib
  12. drwxr-xr-x. 4 hadoop hadoop 288 911 00:51 libexec
  13. -rw-rw-r--. 1 hadoop hadoop 150569 910 22:35 LICENSE.txt
  14. -rw-rw-r--. 1 hadoop hadoop 22125 910 22:35 NOTICE.txt
  15. -rw-rw-r--. 1 hadoop hadoop 1361 910 22:35 README.txt
  16. drwxr-xr-x. 3 hadoop hadoop 4096 910 23:58 sbin
  17. drwxr-xr-x. 4 hadoop hadoop 31 911 01:11 share

修改core-site.xml

需要在/etc/hadoop/目录下,修改配置文件core-site.xml。在配置文件中添加了两项内容,一个是fs.defaultFS,它是指定HDFS的主节点,即node01的所在的centos7机器。另一个是hadoop.tmp.dir,用于指定Hadoop缓存数据的目录需要手工创建该目录:mkdir -p /root/opt/data/tep

  1. [root@node01 hadoop-3.2.1]# cd etc/hadoop/
  2. [root@node01 hadoop]# vim core-site.xml
  3. #############
  4. <configuration>
  5. <property>
  6. <name>fs.defaultFS</name>
  7. <value>hdfs://node01:9000</value>
  8. </property>
  9. <property>
  10. <name>hadoop.tmp.dir</name>
  11. <value>/root/opt/data/tep</value>
  12. </property>
  13. <property>
  14. <name>dfs.http.address</name>
  15. <value>0.0.0.0:50070</value>
  16. </property>
  17. </configuration>

修改 hdfs-site.xml

配置HDFS上的数据块副本个数的参数,默认设置下,副本个数为3,设置的副本数量必须小于等于机器数量。

  1. [root@node01 hadoop]# vim hdfs-site.xml
  2. #############
  3. <configuration>
  4. <property>
  5. <name>dfs.replication</name>
  6. <value>3</value>
  7. </property>
  8. <property>
  9. <name>dfs.permissions.enables</name>
  10. <value>false</value>
  11. </property>
  12. <property>
  13. <name>dfs.namenode.secondary.http-address</name>
  14. <value>node01:50090</value>
  15. </property>
  16. <property>
  17. <name>dfs.namenode.http-address</name>
  18. <value>node01:9870</value>
  19. </property>
  20. </configuration>

修改mapred-site.xml

为了确保MapReduce使用YARN来进行资源管理和调度,需要修改mapred-site.xml

  1. [root@node01 hadoop]# vim mapred-site.xml
  2. #############
  3. <configuration>
  4. <property>
  5. <name>mapreduce.framework.name</name>
  6. <value>yarn</value>
  7. </property>
  8. </configuration>

把 hdfs 中文件副本的数量设置为 1,因为现在伪分布集群只有一 个节点

修改 yarn-site.xml

指定mapreduceshuffl,在hadooop3.X 配置Hadoop相关变量,并指定resourcemanager所在的主机名为node01,

  1. [root@node01 hadoop]# vim yarn-site.xml
  2. #############
  3. <configuration>
  4. <property>
  5. <name>yarn.nodemanager.aux-services</name>
  6. <value>mapreduce_shuffle</value>
  7. </property>
  8. <property>
  9. <name>yarn.nodemanager.env-whitelist</name>
  10. <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CL ASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
  11. </property>
  12. <property>
  13. <name>yarn.resourcemanager.hostname</name>
  14. <value>node01</value>
  15. </property>
  16. </configuration>

 Hadoop环境变量配置

  1. [root@node01 hadoop]# vim /etc/profile
  2. ############
  3. export HADOOP_HOME=/root/opt/module/hadoop/hadoop-3.2.1/
  4. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  5. [root@node01 hadoop]# source /etc/profile
  6. [root@node01 hadoop]# hadoop version
  7. Hadoop 3.2.1
  8. Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
  9. Compiled by rohithsharmaks on 2019-09-10T15:56Z
  10. Compiled with protoc 2.5.0
  11. From source with checksum 776eaf9eee9c0ffc370bcbc1888737
  12. This command was run using /root/opt/module/hadoop/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
  13. [root@node01 hadoop]# vim hadoop-env.sh
  14. ############
  15. export JAVA_HOME=/usr/local/java/jdk1.8.0_231
  16. export HADOOP_LOG_DIR=/root/opt/data/tep
  17. [root@node01 hadoop]# vim yarn-env.sh
  18. ############
  19. export JAVA_HOME=/usr/local/java/jdk1.8.0_231
  20. [root@node01 hadoop]# vim workers
  21. ############
  22. node02
  23. node03
  24. [root@node01 hadoop] cd ../..
  25. [root@node01 hadoop-3.2.1]# vim sbin/start-dfs.sh
  26. ############
  27. HDFS_DATANODE_USER=root
  28. HDFS_DATANODE_SECURE_USER=hdfs
  29. HDFS_NAMENODE_USER=root
  30. HDFS_SECONDARYNAMENODE_USER=root
  31. [root@node01 hadoop-3.2.1]# vim sbin/stop-dfs.sh
  32. ############
  33. HDFS_DATANODE_USER=root
  34. HDFS_DATANODE_SECURE_USER=hdfs
  35. HDFS_NAMENODE_USER=root
  36. HDFS_SECONDARYNAMENODE_USER=root
  37. [root@node01 hadoop-3.2.1]# vim sbin/start-yarn.sh
  38. ############
  39. YARN_RESOURCEMANAGER_USER=root
  40. HADOOP_SECURE_DN_USER=yarn
  41. YARN_NODEMANAGER_USER=root
  42. [root@node01 hadoop-3.2.1]# vim sbin/stop-yarn.sh
  43. ############
  44. YARN_RESOURCEMANAGER_USER=root
  45. HADOOP_SECURE_DN_USER=yarn
  46. YARN_NODEMANAGER_USER=root

搭建Zookeeper

ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,大数据培训是Google的Chubby一个开源的实现,是Hadoop的重要组件。

下面搭建Zookeeper

  1. [root@node01] mkdir -p opt/module/zookeeper
  2. [root@node01] cd opt/module/zookeeper
  3. [root@node01 zookeeper]# wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
  4. [root@node01 zookeeper]# tar -zxvf apache-zookeeper-3.5.6-bin.tar.gz
  5. [root@node01 zookeeper]# cd apache-zookeeper-3.5.6-bin
  6. [root@node01 apache-zookeeper-3.5.6-bin]# ll
  7. 总用量 32
  8. drwxr-xr-x. 2 elsearch elsearch 232 109 04:14 bin
  9. drwxr-xr-x. 2 elsearch elsearch 70 227 11:20 conf
  10. drwxr-xr-x. 5 elsearch elsearch 4096 109 04:15 docs
  11. drwxr-xr-x. 2 root root 4096 227 11:02 lib
  12. -rw-r--r--. 1 elsearch elsearch 11358 105 19:27 LICENSE.txt
  13. drwxr-xr-x. 2 root root 46 227 11:17 logs
  14. -rw-r--r--. 1 elsearch elsearch 432 109 04:14 NOTICE.txt
  15. -rw-r--r--. 1 elsearch elsearch 1560 109 04:14 README.md
  16. -rw-r--r--. 1 elsearch elsearch 1347 105 19:27 README_packaging.txt
  17. drwxr-xr-x. 3 root root 35 227 11:30 zkdata
  18. drwxr-xr-x. 3 root root 23 227 11:23 zklog
  19. [root@node01 apache-zookeeper-3.5.6-bin]# pwd
  20. /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin
  21. [root@node01 apache-zookeeper-3.5.6-bin]# mkdir zkdata
  22. [root@node01 apache-zookeeper-3.5.6-bin]# mkdir zklog
  23. [root@node01 apache-zookeeper-3.5.6-bin]# cd conf/
  24. [root@node01 conf]# mv zoo_sample.cfg zoo.cfg
  25. [root@node01 conf]# vim zoo.cfg
  26. #############
  27. dataDir=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/zkdata
  28. dataLogDir=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/zklog
  29. server.1=192.168.92.90:2888:3888
  30. server.2=192.168.92.91:2888:3888
  31. server.3=192.168.92.92:2888:3888
  32. [root@node01 conf]# cd ../zkdata/
  33. [root@node01 zkdata]# echo "1" >> myid
  34. [root@node01 zkdata]# vim /etc/profile
  35. ##############
  36. export ZOOKEEPER_HOME=/root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/
  37. export PATH=$PATH:$ZOOKEEPER_HOME/bin
  38. [root@node01 zkdata] # source /etc/profile

9.1.10 启动hadoop和Zookeeper集群

启动hadoop集群前,我们先拷贝主节点node01到node02和node03,第一次开启Hadoop集群需要格式化HDFS

  1. [root@node02 ]# mkdir -p /root/opt
  2. [root@node03 ]# mkdir -p /root/opt
  3. [root@node01 ]# #拷贝到两个从节点
  4. [root@node01 ]# scp -rp opt/ root@node02:/root/opt/
  5. [root@node01 ]# scp -rp opt/ root@node03:/root/opt/
  6. [root@node01 ]# scp -rp /etc/profile node02:/etc/profile
  7. [root@node01 ]# scp -rp /etc/profile node03:/etc/profile
  8. [root@node02]# source /etc/profile
  9. [root@node03]# source /etc/profile
  10. [root@node01 ]# cd ../../
  11. [root@node01 hadoop-3.2.1]# #格式化HDFS
  12. [root@node01 hadoop-3.2.1]# bin/hdfs namenode -format
  13. # 如果在后面的日志信息中能看到这一行,则说明 namenode 格式化成功。
  14. 2020-02-24 15:21:28,893 INFO common.Storage: Storage directory /root/opt/data/tep/dfs/name has been successfully formatted.
  15. # 启动hadoop
  16. [root@node01 hadoop-3.2.1]# sbin/start-all.sh

我们可以在三台机器分别输入jps命令,来判断集群是否启动成功,如果看到以下服务则启动成功。在node01节点上可以看到NameNodeResourceManagerSecondrryNameNodejps 进程

  1. [root@node01 ~]# jps
  2. 3601 ResourceManager
  3. 3346 SecondaryNameNode
  4. 4074 Jps
  5. 3069 NameNode
  6. [root@node02 ~]# jps
  7. 3473 Jps
  8. 3234 NodeManager
  9. 3114 DataNode
  10. [root@node03 ~]# jps
  11. 3031 NodeManager
  12. 3256 Jps
  13. 2909 DataNode

这时,我们可以访问http://192.168.92.90:9870或者http://node01:9870,Hadoop页面如下图9-1所示

图片

Hadoop页面9-1

hadoop2.x向hadoop3.x进化过程中,网页访问端口也进行了更改由50070端口更改为9870端口

下面是Hadoop3.x版本改变的端口号:

类别应用Hadoop2.xHadoop3.x
NameNodePortsNameNode80209820
NameNodePortsNameNode  HTTP UI500709870
NameNodePortsNameNode HTTPS UI504709871
SecondaryNameNode portsSecondaryNameNode HTTP500919869
SecondaryNameNode portsSecondaryNameNode HTTP UI500909868
DataNode portsDataNode IPC500209867
DataNode portsDataNode500109866
DataNode portsDataNode HTTP UI500759864
DataNode portsNameNode504759865

我们可以访问:http://192.168.92.90:8088/,查看集群状态,如下图9-2所示

图片

hadoop集群状态9-2

如果我们想停止集群,执行命令sbin/stop-all.sh

  1. # 停止hadoop
  2. [root@node01 hadoop-3.2.1]# sbin/stop-all.sh

至此,Hadoop分布式集群搭建成功。

下面启动Zookeeper集群,首先修改分发的myid文件,再分别启动Zookeeper。

  1. [root@node02 apache-zookeeper-3.5.6-bin]# vim zkdata/myid
  2. 2
  3. [root@node03 apache-zookeeper-3.5.6-bin]# vim zkdata/myid
  4. 3
  5. #分别启动Zookeeper
  6. [root@node01 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
  7. [root@node02 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
  8. [root@node03 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh start
  9. # 查看Zookeeper启动出现的节点QuorumPeerMain
  10. [root@node01]# jps
  11. 16962 Jps
  12. 16005 NameNode
  13. 16534 ResourceManager
  14. 16903 QuorumPeerMain
  15. 16282 SecondaryNameNode
  16. [root@node02]# jps
  17. 8402 Jps
  18. 8037 NodeManager
  19. 7914 DataNode
  20. 8202 QuorumPeerMain
  21. #查看zookeeper选举状态
  22. [root@node01 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
  23. ZooKeeper JMX enabled by default
  24. Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
  25. Client port found: 2181. Client address: localhost.
  26. Mode: follower
  27. [root@node02 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
  28. ZooKeeper JMX enabled by default
  29. Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
  30. Client port found: 2181. Client address: localhost.
  31. Mode: leader
  32. [root@node03 apache-zookeeper-3.5.6-bin]# bin/zkServer.sh status
  33. ZooKeeper JMX enabled by default
  34. Using config: /root/opt/module/zookeeper/apache-zookeeper-3.5.6-bin/bin/../conf/zoo.cfg
  35. Client port found: 2181. Client address: localhost.
  36. Mode: follower

至此,Zookeeper集群搭建成功。在zookeeper集群中,node02为leader,node01和node03为follower

下面使用客户端命令简单操作zookeeper

  1. [root@node02 apache-zookeeper-3.5.6-bin]# bin/zkCli.sh
  2. [zk: localhost:2181(CONNECTED) 0]# ls查看当前 ZooKeeper 中所包含的内容
  3. [zk: localhost:2181(CONNECTED) 0] ls /
  4. [zookeeper]
  5. [zk: localhost:2181(CONNECTED) 1] # ls2查看当前节点详细数据
  6. [zk: localhost:2181(CONNECTED) 1] ls2 /
  7. [zookeeper]
  8. cZxid = 0x0
  9. ctime = Thu Jan 01 08:00:00 CST 1970
  10. mZxid = 0x0
  11. mtime = Thu Jan 01 08:00:00 CST 1970
  12. pZxid = 0x0
  13. cversion = -1
  14. dataVersion = 0
  15. aclVersion = 0
  16. ephemeralOwner = 0x0
  17. dataLength = 0
  18. numChildren = 1
  19. [zk: localhost:2181(CONNECTED) 2] # create创建节点
  20. [zk: localhost:2181(CONNECTED) 2] create /zk myData
  21. Created /zk
  22. [zk: localhost:2181(CONNECTED) 3] # get获得节点的值
  23. [zk: localhost:2181(CONNECTED) 3] get /zk
  24. myData
  25. [zk: localhost:2181(CONNECTED) 4] # set修改节点的值
  26. [zk: localhost:2181(CONNECTED) 4] set /zk myData1
  27. [zk: localhost:2181(CONNECTED) 5] get /zk
  28. myData1
  29. [zk: localhost:2181(CONNECTED) 6] # create创建子节点
  30. [zk: localhost:2181(CONNECTED) 6] create /zk/zk01 myData2
  31. Created /zk/zk01
  32. [zk: localhost:2181(CONNECTED) 7] # stat检查状态
  33. [zk: localhost:2181(CONNECTED) 7] stat /zk
  34. cZxid = 0x100000008
  35. ctime = Thu Feb 27 12:39:43 CST 2020
  36. mZxid = 0x100000009
  37. mtime = Thu Feb 27 12:42:37 CST 2020
  38. pZxid = 0x10000000b
  39. cversion = 1
  40. dataVersion = 1
  41. aclVersion = 0
  42. ephemeralOwner = 0x0
  43. dataLength = 7
  44. numChildren = 1
  45. [zk: localhost:2181(CONNECTED) 8] # rmr移除节点
  46. [zk: localhost:2181(CONNECTED) 8] rmr /zk

至此,我们对zookeeper就算有了一个入门的了解,当然zookeeper远比我们这里描述的功能多,比如用zookeeper实现集群管理,分布式锁,分布式队列,zookeeper集群leader选举,Java API编程等。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/519916
推荐阅读
相关标签
  

闽ICP备14008679号