当前位置:   article > 正文

一篇教你搞定spark搭建

一篇教你搞定spark搭建

一、Hadoop平台安装

1、实验环境

服务器集群单节点,机器最低配置:双核CPU、8GB内存、100G硬盘
运行环境CentOS 7.4
服务和组件服务和组件根据实验需求安装

2、实验过程

1)查看IP地址
  1. [root@localhost ~]# ip add show
  2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
  3. qlen 1
  4. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  5. inet 127.0.0.1/8 scope host lo
  6. valid_lft forever preferred_lft forever
  7. inet6 ::1/128 scope host
  8. valid_lft forever preferred_lft forever
  9. 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
  10. state UP qlen 1000
  11. link/ether 00:0c:29:b7:35:be brd ff:ff:ff:ff:ff:ff18
  12. inet 192.168.47.140/24 brd 192.168.47.255 scope global dynamic ens33
  13. valid_lft 1460sec preferred_lft 1460sec
  14. inet6 fe80::29cc:5498:c98a:af4b/64 scope link
  15. valid_lft forever preferred_lft forever

2)设置服务器主机名
  1. [root@localhost ~]# hostnamectl set-hostname master
  2. [root@localhost ~]# bash
  3. [root@master ~]# hostname
  4. master

3)绑定主机名与IP地址 
  1. [root@master ~]# vi /etc/hosts
  2. 127.0.0.1 localhost localhost.localdomain localhost4
  3. localhost4.localdomain4
  4. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  5. 192.168.47.140 master

4)查看SSH服务状态
  1. [root@master ~]# systemctl status sshd
  2. ● sshd.service - OpenSSH server daemon
  3. Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor
  4. preset: enabled)
  5. Active: active (running) since 一 2021-12-20 08:22:16 CST; 10 months 21
  6. days ago
  7. Docs: man:sshd(8)
  8. man:sshd_config(5)
  9. Main PID: 1048 (sshd)
  10. CGroup: /system.slice/sshd.service
  11. └─1048 /usr/sbin/sshd -D

5)关闭防火墙 
  1. [root@master ~]# systemctl stop firewalld
  2. 关闭防火墙后要查看防火墙的状态,确认一下。
  3. [root@master ~]# systemctl status firewalld
  4. 看到 inactive (dead)就表示防火墙已经关闭。不过这样设置后,Linux 系统如
  5. 果重启,防火墙仍然会重新启动。执行如下命令可以永久关闭防火墙。
  6. [root@master ~]# systemctl disable firewalld
  7. Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
  8. Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

6)创建Hadoop用户
  1. [root@master ~]# useradd hadoop
  2. [root@master ~]# echo "1" |passwd --stdin hadoop
  3. 更改用户 hadoop 的密码 。
  4. passwd:所有的身份验证令牌已经成功更新。

二、安装JAVA环境

1、下载JDK安装包

JDK 安 装 包 需 要 在 Oracle 官 网 下 载 ,下 载 地 址 为 :https://www.oracle.com/java /technologies /javase-jdk8-downloads.html,本教材采用 的 Hadoop 2.7.1 所需要的 JDK 版本为JDK7 以上,这里采用的安装包为 jdk-8u152-linux-x64.tar.gz。

 2、卸载自带的Open JDK

  1. [root@master ~]# rpm -qa | grep java
  2. javapackages-tools-3.4.1-11.el7.noarch
  3. java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64
  4. tzdata-java-2022e-1.el7.noarch
  5. python-javapackages-3.4.1-11.el7.noarch
  6. java-1.8.0-openjdk-headless-1.8.0.352.b08-2.el7_9.x86_64
  7. 卸载相关服务,键入命令
  8. [root@master ~]# rpm -e --nodeps javapackages-tools-3.4.1-11.el7.noarch
  9. [root@master ~]# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.352.b08-
  10. 2.el7_9.x86_64
  11. [root@master ~]# rpm -e --nodeps tzdata-java-2022e-1.el7.noarch
  12. [root@master ~]# rpm -e --nodeps python-javapackages-3.4.1-11.el7.noarch
  13. [root@master ~]# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.352.b08-
  14. 2.el7_9.x86_64
  15. [root@master ~]# rpm -qa | grep java
  16. 查看删除结果再次键入命令 java -version 出现以下结果表示删除功
  17. [root@master ~]# java --version
  18. bash: java: 未找到命令

3、安装JDK

Hadoop 2.7.1 要求 JDK 的版本为 1.7 以上,这里安装的是 JDK1.8 版
(即JAVA 8)。
安装命令如下,将安装包解压到/usr/local/src 目录下 ,注意/opt/software目录
下的软件包事先准备好。
  1. [root@master ~]# tar -zxvf /opt/software/jdk-8u152-linux-x64.tar.gz -C
  2. /usr/local/src/
  3. [root@master ~]# ls /usr/local/src/
  4. jdk1.8.0_152

4、设置JAVA环境变量

在 Linux 中设置环境变量的方法比较多,较常见的有两种:一是配置 /etc/profile 文件,配置结果对整个系统有效,系统所有用户都可以使用;二 是配置~/.bashrc 文件,配置结果仅对当前用户效。这里使用第一种方法。
  1. [root@master ~]# vi /etc/profile
  2. 在文件的最后增加如下两行:
  3. export JAVA_HOME=/usr/local/src/jdk1.8.0_152
  4. export PATH=$PATH:$JAVA_HOME/bin
  5. 执行 source 使设置生效:
  6. [root@master ~]# source /etc/profile
  7. 检查 JAVA 是否可用。
  8. [root@master ~]# echo $JAVA_HOME
  9. /usr/local/src/jdk1.8.0_152
  10. [root@master ~]# java -version
  11. java version "1.8.0_152"
  12. Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
  13. Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)
  14. 能够正常显示 Java 版本则说明 JDK 安装并配置成功。

三、安装Hadoop软件

1、获取Hadoop安装包

Apache Hadoop 各个版本的下载网址:https://archive.apache.org/dist/hadoop /common/。本教材选用的是 Hadoop 2.7.1 版本,安装包为 hadoop-2.7.1.tar.gz。需要先下载 Hadoop 安装包,再上传到 Linux 系统的/opt/software 目录。具体的方法见前一节“实验一 Linux 操作系统环境设置”,这里就不再赘述。

2、安装Hadoop软件

1)安装Hadoop软件
  1. 安装命令如下,将安装包解压到/usr/local/src/目录下
  2. [root@master ~]# tar -zxvf /opt/software/hadoop-2.7.1.tar.gz -C
  3. /usr/local/src/
  4. [root@master ~]# ll /usr/local/src/
  5. 总用量 0
  6. drwxr-xr-x. 9 10021 10021 149 629 2015 hadoop-2.7.1
  7. drwxr-xr-x. 8 10 143 255 914 2017 jdk1.8.0_152
  8. 查看 Hadoop 目录,得知 Hadoop 目录内容如下:
  9. [root@master ~]# ll /usr/local/src/hadoop-2.7.1/
  10. 总用量 28
  11. drwxr-xr-x. 2 10021 10021 194 629 2015 bin
  12. drwxr-xr-x. 3 10021 10021 20 629 2015 etc
  13. drwxr-xr-x. 2 10021 10021 106 629 2015 include
  14. drwxr-xr-x. 3 10021 10021 20 629 2015 lib
  15. drwxr-xr-x. 2 10021 10021 239 629 2015 libexec
  16. -rw-r--r--. 1 10021 10021 15429 629 2015 LICENSE.txt
  17. -rw-r--r--. 1 10021 10021 101 629 2015 NOTICE.txt
  18. -rw-r--r--. 1 10021 10021 1366 629 2015 README.txt
  19. drwxr-xr-x. 2 10021 10021 4096 629 2015 sbin
  20. drwxr-xr-x. 4 10021 10021 31 629 2015 share

解析:
bin:此目录中存放 Hadoop、HDFS、YARN 和 MapReduce 运行程序和管理软件。
etc:存放 Hadoop 配置文件。
include: 类似 C 语言的头文件
lib:本地库文件,支持对数据进行压缩和解压。
libexe:同 lib 
sbin:Hadoop 集群启动、停止命令
share:说明文档、案例和依赖 jar 包。

2)配置Hadoop环境变量
  1. 和设置 JAVA 环境变量类似,修改/etc/profile 文件。
  2. [root@master ~]# vi /etc/profile
  3. 在文件的最后增加如下两行:
  4. export HADOOP_HOME=/usr/local/src/hadoop-2.7.1
  5. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  6. 执行 source 使用设置生效:
  7. [root@master ~]# source /etc/profile
  8. 检查设置是否生效:
  9. [root@master ~]# hadoop
  10. Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  11. CLASSNAME run the class named CLASSNAME
  12. or
  13. where COMMAND is one of:
  14. fs run a generic filesystem user client
  15. version print the version
  16. jar <jar> run a jar file
  17. note: please use "yarn jar" to launch
  18. YARN applications, not this command.
  19. checknative [-a|-h] check native hadoop and compression libraries
  20. availability
  21. distcp <srcurl> <desturl> copy file or directories recursively
  22. archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop
  23. archive
  24. classpath prints the class path needed to get the
  25. credential interact with credential providers
  26. Hadoop jar and the required libraries
  27. daemonlog get/set the log level for each daemon
  28. trace view and modify Hadoop tracing settings
  29. Most commands print help when invoked w/o parameters.
  30. 23
  31. 24
  32. [root@master ~]#
  33. 出现上述 Hadoop 帮助信息就说明 Hadoop 已经安装好了。

3)修改目录所有者和所有者组
上述安装完成的 Hadoop 软件只能让 root 用户使用,要让 hadoop 用户能够
运行 Hadoop 软件,需要将目录/usr/local/src 的所有者改为 hadoop 用户。
  1. [root@master ~]# chown -R hadoop:hadoop /usr/local/src/
  2. [root@master ~]# ll /usr/local/src/
  3. 总用量 0
  4. drwxr-xr-x. 9 hadoop hadoop 149 629 2015 hadoop-2.7.1
  5. drwxr-xr-x. 8 hadoop hadoop 255 914 2017 jdk1.8.0_152
  6. /usr/local/src 目录的所有者已经改为 hadoop 了。

四、安装单机版Hadoop系统

1、配置Hadoop配置文件

  1. [root@master ~]# cd /usr/local/src/hadoop-2.7.1/
  2. [root@master hadoop-2.7.1]# ls
  3. bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
  4. [root@master hadoop-2.7.1]# vi etc/hadoop/hadoop-env.sh
  5. 在文件中查找 export JAVA_HOME 这行,将其改为如下所示内容:
  6. export JAVA_HOME=/usr/local/src/jdk1.8.0_152

2、测试Hadoop本地模式的运行

1)切换到hadoop用户
  1. [root@master hadoop-2.7.1]# su - hadoop
  2. [hadoop@master ~]$ id
  3. uid=1001(hadoop) gid=1001(hadoop) 组=1001(hadoop) 环境
  4. =unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

2)创建输入数据存放目录
  1. 将输入数据存放在~/input 目录(hadoop 用户主目录下的 input 目录中)。
  2. [hadoop@master ~]$ mkdir ~/input
  3. [hadoop@master ~]$ ls
  4. Input

3)创建数据输入文件 
  1. 创建数据文件 data.txt,将要测试的数据内容输入到 data.txt 文件中。
  2. [hadoop@master ~]$ vi input/data.txt
  3. 输入如下内容,保存退出。
  4. Hello World
  5. Hello Hadoop
  6. Hello Husan

4)测试Map Reduce运行
  1. [hadoop@master ~]$ hadoop jar /usr/local/src/hadoop-
  2. 2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar
  3. wordcount ~/input/data.txt ~/output
  4. 运行结果保存在~/output 目录中(注:结果输出目录不能事先存在),命令执
  5. 行后查看结果:
  6. [hadoop@master ~]$ ll output/
  7. 总用量 4
  8. -rw-r--r--. 1 hadoop hadoop 33 1110 23:50 part-r-00000
  9. -rw-r--r--. 1 hadoop hadoop 0 1110 23:50 _SUCCESS
  10. 文件_SUCCESS 表示处理成功,处理的结果存放在 part-r-00000 文件中,查看该
  11. 文件。
  12. [hadoop@master ~]$ cat output/part-r-00000
  13. Hadoop1
  14. Hello 3
  15. Husan 1
  16. World 1

3、Hadoop平台环境设置

1)实验环境下集群网络配置
  1. 修改 slave1 机器主机名
  2. [root@localhost ~]# hostnamectl set-hostname slave1
  3. [root@localhost ~]# bash
  4. [root@slave1 ~]#
  5. 修改 slave2 机器主机名
  6. [root@localhost ~]# hostnamectl set-hostname slave2
  7. [root@localhost ~]# bash
  8. [root@slave2 ~]#
  9. 根据实验环境下集群网络 IP 地址规划(根据自己主机的ip即可):
  10. master 设置 IP 地址是“192.168.47.140”,掩码是“255.255.255.0”;
  11. slave1 设置 IP 地址“192.168.47.141”,掩码是“255.255.255.0”;
  12. slave2 设置 IP 地址是“192.168.47.142”,掩码是“255.255.255.0”。
  13. 根据我们为 Hadoop 设置的主机名为“master、slave1、slave2”,映地址是
  14. 192.168.47.140192.168.47.141192.168.47.142”,分别修改主机配置文件“/etc/hosts”,
  15. 在命令终端输入如下命令:
  16. [root@master ~]# vi /etc/hosts
  17. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  18. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  19. 192.168.47.140 master
  20. 192.168.47.141 slave1
  21. 192.168.47.142 slave2
  22. [root@slave1 ~]# vi /etc/hosts
  23. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  24. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  25. 192.168.47.140 master
  26. 192.168.47.141 slave1
  27. 192.168.47.142 slave2
  28. [root@slave2 ~]# vi /etc/hosts
  29. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  30. 28
  31. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  32. 192.168.47.140 master
  33. 192.168.47.141 slave1
  34. 192.168.47.142 slave2

4、SSH无密码验证配置

1)生成SSH密钥
  • 每个节点安装和启动SSH协议
  1. [root@master ~]# rpm -qa | grep openssh
  2. openssh-server-7.4p1-11.el7.x86_64
  3. openssh-7.4p1-11.el7.x86_64
  4. openssh-clients-7.4p1-11.el7.x86_64
  5. [root@master ~]# rpm -qa | grep rsync
  6. rsync-3.1.2-11.el7_9.x86_64

  • 切换到hadoop用户
  1. [root@master ~]# su - hadoop
  2. [hadoop@master ~]$
  3. [root@slave1 ~]# useradd hadoop
  4. [root@slave1 ~]# su - hadoop
  5. [hadoop@slave1 ~]$
  6. [root@slave2 ~]# useradd hadoop
  7. [root@slave2 ~]# su - hadoop
  8. [hadoop@slave2 ~]$

  • 每个节点生成密钥对
  1. #在 master 上生成密钥
  2. [hadoop@master ~]$ ssh-keygen -t rsa
  3. Generating public/private rsa key pair.
  4. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  5. Created directory '/home/hadoop/.ssh'.
  6. Enter passphrase (empty for no passphrase):
  7. Enter same passphrase again:
  8. Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  9. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  10. The key fingerprint is:
  11. SHA256:LOwqw+EjBHJRh9U1GdRHfbhV5+5BX+/hOHTEatwIKdU hadoop@master
  12. The key's randomart image is:
  13. +---[RSA 2048]----+
  14. | ..oo. o==...o+|
  15. | . .. . o.oE+.=|
  16. | . . o . *+|
  17. |o . . . . o B.+|
  18. |o. o S * =+|
  19. | .. . . o +oo|
  20. |.o . . o .o|
  21. |. * . . |
  22. | . +. |
  23. +----[SHA256]-----+
  24. #slave1 生成密钥
  25. [hadoop@slave1 ~]$ ssh-keygen -t rsa
  26. Generating public/private rsa key pair.
  27. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  28. Created directory '/home/hadoop/.ssh'.
  29. Enter passphrase (empty for no passphrase):
  30. Enter same passphrase again:
  31. Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  32. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  33. 29
  34. The key fingerprint is:
  35. SHA256:RhgNGuoa3uSrRMjhPtWA5NucyhbLr9NsEZ13i01LBaA
  36. hadoop@slave1
  37. The key's randomart image is:
  38. +---[RSA 2048]----+
  39. | . . o+... |
  40. |o .. o.o. . |
  41. | +..oEo . . |
  42. |+.=.+o o + |
  43. |o*.*... S o |
  44. |*oO. o + |
  45. |.@oo. |
  46. |o.o+. |
  47. | o=o |
  48. +----[SHA256]-----+
  49. #slave2 生成密钥
  50. [hadoop@slave2 ~]$ ssh-keygen -t rsa
  51. Generating public/private rsa key pair.
  52. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  53. Enter passphrase (empty for no passphrase):
  54. Enter same passphrase again:
  55. Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  56. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  57. The key fingerprint is:
  58. SHA256:yjp6AQEu2RN81Uv6y40MI/1p5WKWbVeGfB8/KK6iPUA
  59. hadoop@slave2
  60. The key's randomart image is:
  61. +---[RSA 2048]----+
  62. |.o. ... |
  63. |.oo.. o |
  64. |o.oo o . |
  65. |. .. E. . |
  66. | ... .S . . |
  67. | oo+.. . o +. |
  68. | o+* X +..o|
  69. | o..o& =... .o|
  70. | .o.o.=o+oo. .|
  71. +----[SHA256]-----+

  • 查看“/home/hadoop/”下是否有“.ssh”文件夹,且“.ssh”文件下是否有两个刚生产的无密码密钥对
  1. [hadoop@master ~]$ ls ~/.ssh/
  2. id_rsa id_rsa.pub

  • 将id_rsa.pud追加到授权key文件中
  1. #master
  2. [hadoop@master ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  3. [hadoop@master ~]$ ls ~/.ssh/
  4. authorized_keys id_rsa id_rsa.pub
  5. #slave1
  6. [hadoop@slave1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  7. [hadoop@slave1 ~]$ ls ~/.ssh/
  8. authorized_keys id_rsa id_rsa.pub
  9. #slave2
  10. [hadoop@slave2 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  11. [hadoop@slave2 ~]$ ls ~/.ssh/
  12. authorized_keys id_rsa id_rsa.pub

  • 修改文件“authorized_keys”权限
  1. #master
  2. [hadoop@master ~]$ chmod 600 ~/.ssh/authorized_keys
  3. [hadoop@master ~]$ ll ~/.ssh/
  4. 总用量 12
  5. -rw-------. 1 hadoop hadoop 395 1114 16:18 authorized_keys
  6. -rw-------. 1 hadoop hadoop 1679 1114 16:14 id_rsa
  7. -rw-r--r--. 1 hadoop hadoop 395 1114 16:14 id_rsa.pub
  8. #slave1
  9. [hadoop@slave1 ~]$ chmod 600 ~/.ssh/authorized_keys
  10. [hadoop@slave1 ~]$ ll ~/.ssh/
  11. 总用量 12
  12. -rw-------. 1 hadoop hadoop 395 1114 16:18 authorized_keys
  13. -rw-------. 1 hadoop hadoop 1675 1114 16:14 id_rsa
  14. -rw-r--r--. 1 hadoop hadoop 395 1114 16:14 id_rsa.pub
  15. #slave2
  16. [hadoop@slave2 ~]$ chmod 600 ~/.ssh/authorized_keys
  17. [hadoop@slave2 ~]$ ll ~/.ssh/
  18. 总用量 12
  19. -rw-------. 1 hadoop hadoop 395 1114 16:19 authorized_keys
  20. -rw-------. 1 hadoop hadoop 1679 1114 16:15 id_rsa
  21. -rw-r--r--. 1 hadoop hadoop 395 1114 16:15 id_rsa.pub

  • 配置SSH服务
[root@master ~]# systemctl restart sshd
  •  切换到hadoop用户
  1. [root@master ~]# su - hadoop
  2. 上一次登录:一 1114 16:11:14 CST 2022pts/1
  3. [hadoop@master ~]$
  • 验证SSH登录本机
  1. [hadoop@master ~]$ ssh localhost
  2. The authenticity of host 'localhost (::1)' can't be established.
  3. ECDSA key fingerprint is
  4. SHA256:KvO9HlwdCTJLStOxZWN7qrfRr8FJvcEw2hzWAF9b3bQ.
  5. ECDSA key fingerprint is MD5:07:91:56:9e:0b:55:05:05:58:02:15:5e:68:db:be:73.
  6. Are you sure you want to continue connecting (yes/no)? yes
  7. Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
  8. Last login: Mon Nov 14 16:28:30 2022
  9. [hadoop@master ~]$
2)交换SSH密钥
  • 将master节点的公钥id_rsa.pud复制到每个slave点
  1. hadoop 用户登录,通过 scp 命令实现密钥拷贝。
  2. [hadoop@master ~]$ scp ~/.ssh/id_rsa.pub hadoop@slave1:~/
  3. hadoop@slave1's password:
  4. id_rsa.pub 100% 395 303.6KB/s 00:00
  5. [hadoop@master ~]$ scp ~/.ssh/id_rsa.pub hadoop@slave2:~/
  6. The authenticity of host 'slave2 (192.168.47.142)' can't be established.
  7. ECDSA key fingerprint is
  8. SHA256:KvO9HlwdCTJLStOxZWN7qrfRr8FJvcEw2hzWAF9b3bQ.
  9. ECDSA key fingerprint is MD5:07:91:56:9e:0b:55:05:05:58:02:15:5e:68:db:be:73.
  10. Are you sure you want to continue connecting (yes/no)? yes
  11. Warning: Permanently added 'slave2,192.168.47.142' (ECDSA) to the list of known
  12. hosts.
  13. hadoop@slave2's password:
  14. id_rsa.pub 100% 395 131.6KB/s 00:00
首次远程连接时系统会询问用户是否要继续连接。需要输入“yes”,表示继续。因为目
前尚未完成密钥认证的配置,所以使用 scp 命令拷贝文件需要输入slave1 节点 hadoop
用户的密码。
  • 在每个 Slave 节点把 Master 节点复制的公钥复制到authorized_keys 文件
  1. hadoop 用户登录 slave1 和 slave2 节点,执行命令。
  2. [hadoop@slave1 ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys
  3. [hadoop@slave2 ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys
  • 在每个 Slave 节点删除 id_rsa.pub 文件
  1. [hadoop@slave1 ~]$ rm -rf ~/id_rsa.pub
  2. [hadoop@slave2 ~]$ rm -rf ~/id_rsa.pub
  •  将每个 Slave 节点的公钥保存到 Master
  1. 1)将 Slave1 节点的公钥复制到 Master
  2. [hadoop@slave1 ~]$ scp ~/.ssh/id_rsa.pub hadoop@master:~/
  3. The authenticity of host 'master (192.168.47.140)' can't be established.
  4. ECDSA key fingerprint is
  5. SHA256:KvO9HlwdCTJLStOxZWN7qrfRr8FJvcEw2hzWAF9b3bQ.
  6. ECDSA key fingerprint is
  7. MD5:07:91:56:9e:0b:55:05:05:58:02:15:5e:68:db:be:73.
  8. Are you sure you want to continue connecting (yes/no)? yes
  9. Warning: Permanently added 'master,192.168.47.140' (ECDSA) to the list of
  10. known hosts.
  11. hadoop@master's password:
  12. id_rsa.pub 100% 395 317.8KB/s 00:00
  13. [hadoop@slave1 ~]$
  14. 2)在 Master 节点把从 Slave 节点复制的公钥复制到 authorized_keys 文件
  15. [hadoop@master ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys
  16. 3)在 Master 节点删除 id_rsa.pub 文件
  17. [hadoop@master ~]$ rm -rf ~/id_rsa.pub
  18. 4)将 Slave2 节点的公钥复制到 Master
  19. [hadoop@slave2 ~]$ scp ~/.ssh/id_rsa.pub hadoop@master:~/
  20. The authenticity of host 'master (192.168.47.140)' can't be established.
  21. ECDSA key fingerprint is
  22. SHA256:KvO9HlwdCTJLStOxZWN7qrfRr8FJvcEw2hzWAF9b3bQ.
  23. ECDSA key fingerprint is MD5:07:91:56:9e:0b:55:05:05:58:02:15:5e:68:db:be:73.
  24. Are you sure you want to continue connecting (yes/no)? yes
  25. Warning: Permanently added 'master,192.168.47.140' (ECDSA) to the list of known
  26. hosts.
  27. hadoop@master's password:
  28. id_rsa.pub 100% 395 326.6KB/s 00:00
  29. [hadoop@slave2 ~]$
  30. 5)在 Master 节点把从 Slave 节点复制的公钥复制到 authorized_keys 文件
  31. [hadoop@master ~]$ cat ~/id_rsa.pub >>~/.ssh/authorized_keys
  32. 6)在 Master 节点删除 id_rsa.pub 文件
  33. [hadoop@master ~]$ rm -rf ~/id_rsa.pub
3)验证SSH无密码登录
  • 查看 Master 节点 authorized_keys 文件
  1. [hadoop@master ~]$ cat ~/.ssh/authorized_keys
  2. ssh-rsa
  3. AAAAB3NzaC1yc2EAAAADAQABAAABAQDzHmpOfy7nwV1X453YY0UOZNTppiPA
  4. 9DI/vZWgWsK6hhw0pupzyxmG5LnNh7IhBlDCAKKmohOMUq9cKM3XMBq8R1f8
  5. ys8VOPlWSKYndGxu6mbTY8wdcPWvINlAvCf2GN6rE1QJXwBAYdvZ8n5UGWqbQ
  6. 0zdqQG1uhix9FN327dCmUGozmCuCR/lY4utU3ltS3faAz7GHUCchpPTE6OopaAk9
  7. yH5ynl+Y7BCwAWblcwf4pYoGWvQ8kMJIIr+k6cZXabsdwa3Y29OODsOsh4EfTmQ
  8. iQbjMKpLahVrJIiL8C/6vuDX8Fh3wvgkvFgrppfzsAYNpKro27JvVgRzdKg7+/BD
  9. hadoop@master
  10. ssh-rsa
  11. AAAAB3NzaC1yc2EAAAADAQABAAABAQDKUKduFzGYN41c0gFXdt3nALXhSqfgH
  12. gmZuSjJnIlpvtQQH1IYm2S50ticwk8fr2TL/lMC/THJbuP6xoT0ZlJBPkbcEBZwkTEd
  13. eb+0uvzUItx7viWb3oDs5s0UGtrQnrP70GszuNnitb+L+f6PRtUVVEYMKagyIpntfIC
  14. AIP8kMRKL3qrwOJ1smtEjwURKbOMDOJHV/EiHP4l+VeVtrPnH6MG3tZbrTTCgFQ
  15. ijSo8Hb4RGFO4NxtSHPH74YMwZBREZ7DPeZMNjqpAttQUH0leM4Ji93RQkcFoy2n
  16. lZljhmKVKzdqazhjJ4DAgT3/FcRvF7YrULKxOHHYj/Jk0rrWwB hadoop@slave1
  17. ssh-rsa
  18. AAAAB3NzaC1yc2EAAAADAQABAAABAQDjlopSpw5GUvoOSiEMQG15MRUrNqsAf
  19. NlnB/TcwDh7Xu7R1qND+StCb7rFScYI+NcDD0JkMBeXZVbQA5T21LSZlmet/38xeJ
  20. Jy53Jx6X1bmf/XnYYf2nnUPRkAUtJeKNPDDA4TN1qnhvAdoSUZgr3uW0oV01jW5
  21. Ai7YFYu1aSHsocmDRKFW2P8kpJZ3ASC7r7+dWFzMjT5Lu3/bjhluAPJESwV48aU2
  22. +wftlT4oJSGTc9vb0HnBpLoZ/yfuAC1TKsccI9p2MnItUUbqI1/uVH2dgmeHwRVpq
  23. qc1Em9hcVh0Gs0vebIGPRNx5eHTf3aIrxR4eRFSwMgF0QkcFr/+yzp
  24. hadoop@slave2
  25. [hadoop@master ~]$
  26. 可以看到 Master 节点 authorized_keys 文件中包括 master、slave1、slave2 三个节点
  27. 的公钥
  • 查看 Slave 节点 authorized_keys 文件
  1. [hadoop@slave1 ~]$ cat ~/.ssh/authorized_keys
  2. ssh-rsa
  3. AAAAB3NzaC1yc2EAAAADAQABAAABAQDKUKduFzGYN41c0gFXdt3nALXhS
  4. qfgHgmZuSjJnIlpvtQQH1IYm2S50ticwk8fr2TL/lMC/THJbuP6xoT0ZlJBPkbcE
  5. BZwkTEdeb+0uvzUItx7viWb3oDs5s0UGtrQnrP70GszuNnitb+L+f6PRtUVVEY
  6. MKagyIpntfICAIP8kMRKL3qrwOJ1smtEjwURKbOMDOJHV/EiHP4l+VeVtrPnH
  7. 6MG3tZbrTTCgFQijSo8Hb4RGFO4NxtSHPH74YMwZBREZ7DPeZMNjqpAttQU
  8. H0leM4Ji93RQkcFoy2nlZljhmKVKzdqazhjJ4DAgT3/FcRvF7YrULKxOHHYj/Jk
  9. 0rrWwB hadoop@slave1
  10. ssh-rsa
  11. AAAAB3NzaC1yc2EAAAADAQABAAABAQDzHmpOfy7nwV1X453YY0UOZNTp
  12. piPA9DI/vZWgWsK6hhw0pupzyxmG5LnNh7IhBlDCAKKmohOMUq9cKM3XM
  13. Bq8R1f8ys8VOPlWSKYndGxu6mbTY8wdcPWvINlAvCf2GN6rE1QJXwBAYdvZ
  14. 8n5UGWqbQ0zdqQG1uhix9FN327dCmUGozmCuCR/lY4utU3ltS3faAz7GHUCc
  15. hpPTE6OopaAk9yH5ynl+Y7BCwAWblcwf4pYoGWvQ8kMJIIr+k6cZXabsdwa3
  16. Y29OODsOsh4EfTmQiQbjMKpLahVrJIiL8C/6vuDX8Fh3wvgkvFgrppfzsAYNpK
  17. ro27JvVgRzdKg7+/BD hadoop@master
  18. [hadoop@slave2 ~]$ cat ~/.ssh/authorized_keys
  19. ssh-rsa
  20. AAAAB3NzaC1yc2EAAAADAQABAAABAQDjlopSpw5GUvoOSiEMQG15MRUrN
  21. qsAfNlnB/TcwDh7Xu7R1qND+StCb7rFScYI+NcDD0JkMBeXZVbQA5T21LSZl
  22. met/38xeJJy53Jx6X1bmf/XnYYf2nnUPRkAUtJeKNPDDA4TN1qnhvAdoSUZgr3
  23. uW0oV01jW5Ai7YFYu1aSHsocmDRKFW2P8kpJZ3ASC7r7+dWFzMjT5Lu3/bj
  24. hluAPJESwV48aU2+wftlT4oJSGTc9vb0HnBpLoZ/yfuAC1TKsccI9p2MnItUUbq
  25. I1/uVH2dgmeHwRVpqqc1Em9hcVh0Gs0vebIGPRNx5eHTf3aIrxR4eRFSwMg
  26. F0QkcFr/+yzp hadoop@slave2
  27. ssh-rsa
  28. AAAAB3NzaC1yc2EAAAADAQABAAABAQDzHmpOfy7nwV1X453YY0UOZNTp
  29. piPA9DI/vZWgWsK6hhw0pupzyxmG5LnNh7IhBlDCAKKmohOMUq9cKM3XM
  30. Bq8R1f8ys8VOPlWSKYndGxu6mbTY8wdcPWvINlAvCf2GN6rE1QJXwBAYdvZ
  31. 8n5UGWqbQ0zdqQG1uhix9FN327dCmUGozmCuCR/lY4utU3ltS3faAz7GHUCc
  32. hpPTE6OopaAk9yH5ynl+Y7BCwAWblcwf4pYoGWvQ8kMJIIr+k6cZXabsdwa3
  33. Y29OODsOsh4EfTmQiQbjMKpLahVrJIiL8C/6vuDX8Fh3wvgkvFgrppfzsAYNpK
  34. ro27JvVgRzdKg7+/BD hadoop@master
  35. 可以看到 Slave 节点 authorized_keys 文件中包括 Master、当前 Slave 两个节点
  36. 的公钥
  • 验证 Master 到每个 Slave 节点无密码登录
  1. hadoop 用户登录 master 节点,执行 SSH 命令登录 slave1 和 slave2 节点。可以观察
  2. 到不需要输入密码即可实现 SSH 登录。
  3. [hadoop@master ~]$ ssh slave1
  4. Last login: Mon Nov 14 16:34:56 2022
  5. [hadoop@slave1 ~]$
  6. [hadoop@master ~]$ ssh slave2
  7. Last login: Mon Nov 14 16:49:34 2022 from 192.168.47.140
  8. [hadoop@slave2 ~]$
  • 验证两个 Slave 节点到 Master 节点无密码登录
  1. [hadoop@slave1 ~]$ ssh master
  2. Last login: Mon Nov 14 16:30:45 2022 from ::1
  3. [hadoop@master ~]$
  4. [hadoop@slave2 ~]$ ssh master
  5. Last login: Mon Nov 14 16:50:49 2022 from 192.168.47.141
  6. [hadoop@master ~]$
  • 配置两个子节点slave1、slave2的JDK环境。
  1. [root@master ~]# cd /usr/local/src/
  2. [root@master src]# ls
  3. hadoop-2.7.1 jdk1.8.0_152
  4. [root@master src]# scp -r jdk1.8.0_152 root@slave1:/usr/local/src/
  5. [root@master src]# scp -r jdk1.8.0_152 root@slave2:/usr/local/src/
  6. #slave1
  7. [root@slave1 ~]# ls /usr/local/src/
  8. jdk1.8.0_152
  9. [root@slave1 ~]# vi /etc/profile #此文件最后添加下面两行
  10. export JAVA_HOME=/usr/local/src/jdk1.8.0_152
  11. export PATH=$PATH:$JAVA_HOME/bin
  12. [root@slave1 ~]# source /etc/profile
  13. [root@slave1 ~]# java -version
  14. java version "1.8.0_152"
  15. Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
  16. Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)
  17. #slave2
  18. [root@slave2 ~]# ls /usr/local/src/
  19. jdk1.8.0_152
  20. [root@slave2 ~]# vi /etc/profile #此文件最后添加下面两行
  21. export JAVA_HOME=/usr/local/src/jdk1.8.0_152
  22. export PATH=$PATH:$JAVA_HOME/bin
  23. [root@slave2 ~]# source /etc/profile、
  24. [root@slave2 ~]# java -version
  25. java version "1.8.0_152"
  26. Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
  27. Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

五、Hadoop集群运行

1、在master节点上安装Hadoop

  1. 1. 将 hadoop-2.7.1 文件夹重命名为 Hadoop
  2. [root@master ~]# cd /usr/local/src/
  3. [root@master src]# mv hadoop-2.7.1 hadoop
  4. [root@master src]# ls
  5. hadoop jdk1.8.0_152
  6. 2. 配置 Hadoop 环境变量
  7. [root@master src]# yum install -y vim
  8. [root@master src]# vim /etc/profile
  9. [root@master src]# tail -n 4 /etc/profile
  10. export JAVA_HOME=/usr/local/src/jdk1.8.0_152
  11. export PATH=$PATH:$JAVA_HOME/bin
  12. export HADOOP_HOME=/usr/local/src/hadoop
  13. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
  14. 3. 使配置的 Hadoop 的环境变量生效
  15. [root@master src]# su - hadoop
  16. 上一次登录:一 228 15:55:37 CST 2022192.168.41.143pts/1
  17. [hadoop@master ~]$ source /etc/profile
  18. [hadoop@master ~]$ exit
  19. 登出
  20. 4. 执行以下命令修改 hadoop-env.sh 配置文件
  21. [root@master src]# cd /usr/local/src/hadoop/etc/hadoop/
  22. [root@master hadoop]# vim hadoop-env.sh #修改以下配置
  23. export JAVA_HOME=/usr/local/src/jdk1.8.0_152

2、配置hdfs-site.xml文件参数

  1. [root@master hadoop]# vim hdfs-site.xml #编辑以下内容
  2. [root@master hadoop]# tail -n 14 hdfs-site.xml
  3. <configuration>
  4. <property>
  5. <name>dfs.namenode.name.dir</name>
  6. <value>file:/usr/local/src/hadoop/dfs/name</value>
  7. </property>
  8. <property>
  9. <name>dfs.datanode.data.dir</name>
  10. <value>file:/usr/local/src/hadoop/dfs/data</value>
  11. </property>
  12. <property>
  13. <name>dfs.replication</name>
  14. <value>3</value>
  15. </property>
  16. </configuration>

3、配置 core-site.xml 文件参数

  1. [root@master hadoop]# vim core-site.xml #编辑以下内容
  2. [root@master hadoop]# tail -n 14 core-site.xml
  3. <configuration>
  4. <property>
  5. <name>fs.defaultFS</name>
  6. <value>hdfs://192.168.47.140:9000</value>
  7. </property>
  8. <property>
  9. <name>io.file.buffer.size</name>
  10. <value>131072</value>
  11. </property>
  12. <property>
  13. <name>hadoop.tmp.dir</name>
  14. <value>file:/usr/local/src/hadoop/tmp</value>
  15. </property>
  16. </configuration>

4、配置 mapred-site.xml

  1. [root@master hadoop]# pwd
  2. /usr/local/src/hadoop/etc/hadoop
  3. [root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
  4. [root@master hadoop]# vim mapred-site.xml #添加以下配置
  5. [root@master hadoop]# tail -n 14 mapred-site.xml
  6. <configuration>
  7. <property>
  8. <name>mapreduce.framework.name</name>
  9. <value>yarn</value>
  10. </property>
  11. <property>
  12. <name>mapreduce.jobhistory.address</name>
  13. <value>master:10020</value>
  14. </property>
  15. <property>
  16. <name>mapreduce.jobhistory.webapp.address</name>
  17. <value>master:19888</value>
  18. </property>
  19. </configuration>

5、配置 yarn-site.xml

  1. [root@master hadoop]# vim yarn-site.xml #添加以下配置
  2. [root@master hadoop]# tail -n 32 yarn-site.xml
  3. <configuration>
  4. <!-- Site specific YARN configuration properties -->
  5. <property>
  6. <name>yarn.resourcemanager.address</name>
  7. <value>master:8032</value>
  8. </property>
  9. <property>
  10. <name>yarn.resourcemanager.scheduler.address</name>
  11. <value>master:8030</value>
  12. </property>
  13. <property>
  14. <name>yarn.resourcemanager.resource-tracker.address</name>
  15. <value>master:8031</value>
  16. </property>
  17. <property>
  18. <name>yarn.resourcemanager.admin.address</name>
  19. <value>master:8033</value>
  20. </property>
  21. <property>
  22. <name>yarn.resourcemanager.webapp.address</name>
  23. <value>master:8088</value>
  24. </property>
  25. <property>
  26. <name>yarn.nodemanager.aux-services</name>
  27. <value>mapreduce_shuffle</value>
  28. </property>
  29. <property>
  30. <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
  31. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  32. </property>
  33. </configuration>

6、Hadoop 其他相关配置

  1. 1. 配置 masters 文件
  2. [root@master hadoop]# vim masters
  3. [root@master hadoop]# cat masters
  4. 192.168.47.140
  5. 2. 配置 slaves 文件
  6. [root@master hadoop]# vim slaves
  7. [root@master hadoop]# cat slaves
  8. 192.168.47.141
  9. 192.168.47.142
  10. 3. 新建目录
  11. [root@master hadoop]# mkdir /usr/local/src/hadoop/tmp
  12. [root@master hadoop]# mkdir /usr/local/src/hadoop/dfs/name -p
  13. [root@master hadoop]# mkdir /usr/local/src/hadoop/dfs/data -p
  14. 4. 修改目录权限
  15. [root@master hadoop]# chown -R hadoop:hadoop /usr/local/src/hadoop/
  16. 5. 同步配置文件到 Slave 节点
  17. [root@master ~]# scp -r /usr/local/src/hadoop/ root@slave1:/usr/local/src/
  18. The authenticity of host 'slave1 (192.168.47.141)' can't be established.
  19. ECDSA key fingerprint is SHA256:vnHclJTJVtDbeULN8jdOLhTCmqxJNqUQshH9g9LfJ3k.
  20. ECDSA key fingerprint is MD5:31:03:3d:83:46:aa:c4:d0:c9:fc:5f:f1:cf:2d:fd:e2.
  21. Are you sure you want to continue connecting (yes/no)? yes
  22. * * * * * * *
  23. [root@master ~]# scp -r /usr/local/src/hadoop/ root@slave2:/usr/local/src/
  24. The authenticity of host 'slave1 (192.168.47.142)' can't be established.
  25. ECDSA key fingerprint is SHA256:vnHclJTJVtDbeULN8jdOLhTCmqxJNqUQshH9g9LfJ3k.
  26. ECDSA key fingerprint is MD5:31:03:3d:83:46:aa:c4:d0:c9:fc:5f:f1:cf:2d:fd:e2.
  27. Are you sure you want to continue connecting (yes/no)? yes
  28. * * * * * * *
  29. #slave1 配置
  30. [root@slave1 ~]# yum install -y vim
  31. [root@slave1 ~]# vim /etc/profile
  32. [root@slave1 ~]# tail -n 4 /etc/profile
  33. export JAVA_HOME=/usr/local/src/jdk1.8.0_152
  34. export PATH=$PATH:$JAVA_HOME/bin
  35. export HADOOP_HOME=/usr/local/src/hadoop
  36. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
  37. [root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/hadoop/
  38. [root@slave1 ~]# su - hadoop
  39. 上一次登录:四 224 11:29:00 CST 2022192.168.41.148pts/1
  40. [hadoop@slave1 ~]$ source /etc/profile
  41. #slave2 配置
  42. [root@slave2 ~]# yum install -y vim
  43. [root@slave2 ~]# vim /etc/profile
  44. [root@slave2 ~]# tail -n 4 /etc/profile
  45. export JAVA_HOME=/usr/local/src/jdk1.8.0_152
  46. export PATH=$PATH:$JAVA_HOME/bin
  47. export HADOOP_HOME=/usr/local/src/hadoop
  48. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
  49. [root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/hadoop/
  50. [root@slave2 ~]# su - hadoop
  51. 上一次登录:四 224 11:29:19 CST 2022192.168.41.148pts/1
  52. [hadoop@slave2 ~]$ source /etc/profile

六、大数据平台集群运行

1、配置Hadoop格式化

1)NameNode格式化
  1. [root@master ~]# su – hadoop
  2. [hadoop@master ~]# cd /usr/local/src/hadoop/
  3. [hadoop@master hadoop]$ bin/hdfs namenode –format
  4. 结果:
  5. 20/05/02 16:21:50 INFO namenode.NameNode: SHUTDOWN_MSG:
  6. /************************************************************ SHUTDOWN_MSG:
  7. Shutting down NameNode at master/192.168.1.6
  8. ************************************************************/

注:将 NameNode 上的数据清零,第一次启动 HDFS 时要进行格式化,以后启动无需再格式化,否则会缺失 DataNode 进程。另外,只要运行过 HDFS,Hadoop 的工作目录(本书设置为/usr/local/src/hadoop/tmp)就会有数据,如果需要重新格式化,则在格式化之前一定要先删除工作目录下的数据,否则格式化时会出问题。

2)启动NameNode
  1. [hadoop@master hadoop]$ hadoop-daemon.sh start namenode
  2. starting namenode, logging to /opt/module/hadoop-
  3. 2.7.1/logs/hadoop-hadoop-namenode-master.out

2、查看JAVA进程

  1. 启动完成后,可以使用 JPS 命令查看是否成功。JPS 命令是 Java 提供的一个显示当前所有
  2. Java 进程 pid 的命令。
  3. [hadoop@master hadoop]$ jps
  4. 3557 NameNode
  5. 3624 Jps
1)slave节点 启动 DataNode
  1. [hadoop@slave1 hadoop]$ hadoop-daemon.sh start datanode
  2. starting datanode, logging to /opt/module/hadoop-
  3. 2.7.1/logs/hadoop-hadoop-datanode-master.out
  4. [hadoop@slave2 hadoop]$ hadoop-daemon.sh start datanode
  5. starting datanode, logging to /opt/module/hadoop-
  6. 2.7.1/logs/hadoop-hadoop-datanode-master.out
  7. [hadoop@slave1 hadoop]$ jps
  8. 3557 DataNode
  9. 3725 Jps
  10. [hadoop@slave2 hadoop]$ jps
  11. 3557 DataNode
  12. 3725 Jps
2)启动 SecondaryNameNode
  1. [hadoop@master hadoop]$ hadoop-daemon.sh start secondarynamenode
  2. starting secondarynamenode, logging to /opt/module/hadoop-
  3. 2.7.1/logs/hadoop-hadoop-secondarynamenode-master.out
  4. [hadoop@master hadoop]$ jps
  5. 34257 NameNode
  6. 34449 SecondaryNameNode
  7. 34494 Jps
3)查看 HDFS 数据存放位置
  1. [hadoop@master hadoop]$ ll dfs/
  2. 总用量 0
  3. drwx------ 3 hadoop hadoop 21 814 15:26 data
  4. drwxr-xr-x 3 hadoop hadoop 40 814 14:57 name
  5. [hadoop@master hadoop]$ ll ./tmp/dfs
  6. 总用量 0
  7. drwxrwxr-x. 3 hadoop hadoop 21 52 16:34 namesecondary
  8. 可以看出 HDFS 的数据保存在/usr/local/src/hadoop/dfs 目录下,NameNode、
  9. DataNode和/usr/local/src/hadoop/tmp/目录下,SecondaryNameNode 各有一个目
  10. 录存放数据。

3、查看HDFS的报告

  1. [hadoop@master sbin]$ hdfs dfsadmin -report
  2. Configured Capacity: 8202977280 (7.64 GB)
  3. Present Capacity: 4421812224 (4.12 GB)
  4. DFS Remaining: 4046110720 (3.77 GB)
  5. DFS Used: 375701504 (358.30 MB)
  6. DFS Used%: 8.50%
  7. Under replicated blocks: 88
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. -------------------------------------------------
  11. Live datanodes (2):
  12. Name: 192.168.47.141:50010 (slave1)
  13. Hostname: slave1
  14. Decommission Status : Normal
  15. Configured Capacity: 4101488640 (3.82 GB)
  16. DFS Used: 187850752 (179.15 MB)
  17. Non DFS Used: 2109939712 (1.97 GB)
  18. DFS Remaining: 1803698176 (1.68 GB)
  19. DFS Used%: 4.58%
  20. DFS Remaining%: 43.98%
  21. Configured Cache Capacity: 0 (0 B)
  22. Cache Used: 0 (0 B)
  23. Cache Remaining: 0 (0 B)
  24. Cache Used%: 100.00%
  25. Cache Remaining%: 0.00%
  26. Xceivers: 1
  27. Last contact: Mon May 04 18:32:32 CST 2020
  28. Name: 192.168.47.142:50010 (slave2)
  29. Hostname: slave2
  30. Decommission Status : Normal
  31. Configured Capacity: 4101488640 (3.82 GB)
  32. DFS Used: 187850752 (179.15 MB)
  33. Non DFS Used: 1671225344 (1.56 GB)
  34. DFS Remaining: 2242412544 (2.09 GB)
  35. DFS Used%: 4.58%
  36. DFS Remaining%: 54.67%
  37. Configured Cache Capacity: 0 (0 B)
  38. Cache Used: 0 (0 B)
  39. Cache Remaining: 0 (0 B)
  40. Cache Used%: 100.00%
  41. Cache Remaining%: 0.00%
  42. Xceivers: 1
  43. Last contact: Mon May 04 18:32:32 CST 2020

4、使用浏览器查看节点状态

在浏览器的地址栏输入 http://master:50070 ,进入页面可以查看 NameNode DataNode
信息

在浏览器的地址栏输入 http://master:50090 ,进入页面可以查看
SecondaryNameNode 信息
可以使用 start-dfs.sh 命令启动 HDFS 。这时需要配置 SSH 免密码登录,否则在
启动过程中系统将多次要求确认连接和输入 Hadoop 用户密码。
  1. [hadoop@master hadoop]$ stop-dfs.sh
  2. [hadoop@master hadoop]$ start-dfs.sh

七、运行测试

1、HDFS 文件系统中创建数据输入目录

  1. [hadoop@master hadoop]$ start-yarn.sh
  2. [hadoop@master hadoop]$ jps
  3. 34257 NameNode
  4. 34449 SecondaryNameNode
  5. 34494 Jps
  6. 32847 ResourceManager
如果是第一次运行 MapReduce 程序,需要先在 HDFS 文件系统中创建数据输入目
录,存放输入数据。这里指定/input 目录为输入数据的存放目录。 执行如下命
令,在 HDFS 文件系统中创建/input 目录
  1. [hadoop@master hadoop]$ hdfs dfs -mkdir /input
  2. [hadoop@master hadoop]$ hdfs dfs -ls /
  3. Found 1 items
  4. drwxr-xr-x - hadoop supergroup 0 2020-05-02 22:26
  5. /input
  6. 此处创建的/input 目录是在 HDFS 文件系统中,只能用 HDFS 命令查看和操作。

2、将输入数据文件复制到 HDFS /input 目录中

  1. [hadoop@master hadoop]$ cat ~/input/data.txt
  2. Hello World
  3. Hello Hadoop
  4. Hello Huasan
  5. 执行如下命令,将输入数据文件复制到 HDFS 的/input 目录中:
  6. [hadoop@master hadoop]$ hdfs dfs -put ~/input/data.txt /input
  7. 确认文件已复制到 HDFS 的/input 目录:
  8. [hadoop@master hadoop]$ hdfs dfs -ls /input
  9. Found 1 items
  10. -rw-r--r-- 1 hadoop supergroup 38 2020-05-02 22:32
  11. /input/data.txt

3、运行 WordCount 案例,计算数据文件中各单词的频度

  1. 自动创建的/output 目录在 HDFS 文件
  2. 系统中,使用 HDFS 命令查看和操作。
  3. [hadoop@master hadoop]$ hdfs dfs -mkdir /output
  4. 先执行如下命令查看 HDFS 中的文件:
  5. [hadoop@master hadoop]$ hdfs dfs -ls /
  6. Found 3 items
  7. drwxr-xr-x - hadoop supergroup 0 2020-05-02 22:32
  8. /input
  9. drwxr-xr-x - hadoop supergroup 0 2020-05-02 22:49
  10. /output
  11. 上述目录中/input 目录是输入数据存放的目录,/output 目录是输出数据存放的目录。执
  12. 行如下命令,删除/output 目录。
  13. [hadoop@master hadoop]$ hdfs dfs -rm -r -f /output
  14. 20/05/03 09:43:43 INFO fs.TrashPolicyDefault: Namenode trash
  15. configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted
  16. /output
  17. 执行如下命令运行 WordCount 案例:
  18. [hadoop@master hadoop]$ hadoop jar share/hadoop/mapreduce/hado
  19. op-- mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
  20. MapReduce 程序运行过程中的输出信息如下所示:
  21. 20/05/02 22:39:41 INFO client.RMProxy: Connecting to
  22. ResourceManager at localhost/127.0.0.1:8032
  23. 20/05/02 22:39:43 INFO input.FileInputFormat: Total input paths
  24. to process : 1
  25. 20/05/02 22:39:43 INFO mapreduce.JobSubmitter: number of
  26. splits:1
  27. 20/05/02 22:39:44 INFO mapreduce.JobSubmitter: Submitting tokens
  28. for job: job_1588469277215_0001
  29. …… 省略 ……
  30. 20/05/02 22:40:32 INFO mapreduce.Job: map 0% reduce 0%
  31. 20/05/02 22:41:07 INFO mapreduce.Job: map 100% reduce 0% 20/05/02
  32. 22:41:25 INFO mapreduce.Job: map 100% reduce 100% 20/05/02 22:41:27 INFO
  33. mapreduce.Job: Job job_1588469277215_0001
  34. completed successfully
  35. …… 省略 ……
  36. 由上述信息可知 MapReduce 程序提交了一个作业,作业先进行 Map,再进行
  37. Reduce 操作。 MapReduce 作业运行过程也可以在 YARN 集群网页中查看。在浏
  38. 览器的地址栏输入:http://master:8088

在浏览器的地址栏输入 http://master:50070 ,进入页面,在 Utilities 菜单中
选择 Browse the file system ,可以查看 HDFS 文件系统内容。

  1. 可以使用 HDFS 命令直接查看 part-r-00000 文件内容,结果如下所示:
  2. [hadoop@master hadoop]$ hdfs dfs -cat /output/part-r-00000
  3. Hadoop 1
  4. Hello 3
  5. Huasan 1
  6. World 1
  7. 可以看出统计结果正确,说明 Hadoop 运行正常
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/529505
推荐阅读
相关标签
  

闽ICP备14008679号