当前位置:   article > 正文

CentOS7 搭建Hadoop伪分布式_en successfully formatted. 2023-10-11 16:33:10,236

en successfully formatted. 2023-10-11 16:33:10,236 info namenode.fsimageform

 

前言

操作前需要准备:

  1. 虚拟机镜像:CentOS-6.5-x86_64-bin-DVD1.iso
    链接:https://pan.baidu.com/s/1O9a-6Sn7riGWG3mVQssTGg  提取码:rud1
  2. jdk:jdk-8u144-linux-x64.tar.gz
    链接:https://pan.baidu.com/s/1TdaCDaT_qriDMjbYFyphPw  提取码:qulj
  3. hadoop:hadoop-2.7.2.tar.gz
    链接:https://pan.baidu.com/s/1Wt0mAUHKJDSYTUM5-u6CYw  提取码:oofe
  4. 或者官网:https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/

上述的如果百度云下载的慢的话,可以去各大开源论坛或者官网下载
博主使用的工具为Xshell,非常方便的一个软件,感兴趣的话可以动动自己的小手,去官网下载

本文是建立在 CentOS7已经安装,jdk也已安装好的情况下,jdk安装可以参考 jdk安装


一、前期环境配置

关闭防火墙

  1. # 查看防火墙状态
  2. [root@localhost dr]# firewall-cmd --state
  3. running
  4. # 关闭防火墙
  5. [root@localhost dr]# systemctl stop firewalld.service
  6. [root@localhost dr]# firewall-cmd --state
  7. not running
  8. # 禁止 firewall 开机启动
  9. [root@localhost dr]# systemctl disable firewalld.service
  10. Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
  11. Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

修改配置hosts文件,并测试是否能ping通

  1. # 查看本机地址 192.168.23.128
  2. [root@localhost dr]# ifconfig
  3. ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
  4. inet 192.168.23.128 netmask 255.255.255.0 broadcast 192.168.23.255
  5. inet6 fe80::5895:a1c7:da57:e4ad prefixlen 64 scopeid 0x20<link>
  6. ether 00:0c:29:a1:55:a1 txqueuelen 1000 (Ethernet)
  7. RX packets 213928 bytes 299951288 (286.0 MiB)
  8. RX errors 0 dropped 0 overruns 0 frame 0
  9. TX packets 22291 bytes 2345515 (2.2 MiB)
  10. TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
  11. # 修改配置hosts文件
  12. [root@localhost dr]# vi /etc/hosts
  13. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  14. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  15. 192.168.23.128 master
  16. ~
  17. # 测试是否设置成功
  18. [root@localhost dr]# ping master
  19. PING master (192.168.23.128) 56(84) bytes of data.
  20. 64 bytes from master (192.168.23.128): icmp_seq=1 ttl=64 time=0.020 ms
  21. 64 bytes from master (192.168.23.128): icmp_seq=2 ttl=64 time=0.113 ms
  22. 64 bytes from master (192.168.23.128): icmp_seq=3 ttl=64 time=0.023 ms
  23. 64 bytes from master (192.168.23.128): icmp_seq=4 ttl=64 time=0.122 ms

设置免密登录

  1. [root@localhost dr]# ssh-keygen # 一直回车
  2. Generating public/private rsa key pair.
  3. Enter file in which to save the key (/root/.ssh/id_rsa):
  4. Created directory '/root/.ssh'.
  5. Enter passphrase (empty for no passphrase):
  6. Enter same passphrase again:
  7. Your identification has been saved in /root/.ssh/id_rsa.
  8. Your public key has been saved in /root/.ssh/id_rsa.pub.
  9. The key fingerprint is:
  10. SHA256:8CSgRg7wOr5NWlwL1A17rW3CyA9X7RkpFbvP2MAHL4A root@master
  11. The key's randomart image is:
  12. +---[RSA 2048]----+
  13. |+ . .. o. |
  14. | = ...+ o o o |
  15. | =. ooE.= * |
  16. | o. . +=+ = * |
  17. |o . = =So B o |
  18. |... o = o O |
  19. | . + . . . + |
  20. | * |
  21. | o . |
  22. +----[SHA256]-----+
  23. # 给 master 配置免密
  24. [root@localhost .ssh]# ssh-copy-id master
  25. /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
  26. /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  27. /usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
  28. (if you think this is a mistake, you may want to use -f option)

 

二、安装Hadoop

1.解压Hadoop

  • 先把Hadoop的压缩文件放到 本地目录 /home/dr/Datafile/  这是我的目录,下文如是
     
  • 在  /usr/local/ 目录下创建 hadoop 文件夹
     
    1. [root@localhost /]# mkdir /usr/local/hadoop
    2. [root@localhost /]# cd /usr/local/
    3. [root@localhost local]# ll
    4. total 0
    5. drwxr-xr-x. 2 root root 6 Nov 5 2016 bin
    6. drwxr-xr-x. 2 root root 6 Nov 5 2016 etc
    7. drwxr-xr-x. 2 root root 6 Nov 5 2016 games
    8. drwxr-xr-x. 2 root root 6 Mar 29 01:32 hadoop
    9. drwxr-xr-x. 2 root root 6 Nov 5 2016 include
    10. drwxr-xr-x. 3 root root 26 Mar 28 04:43 java
    11. drwxr-xr-x. 2 root root 6 Nov 5 2016 lib
    12. drwxr-xr-x. 2 root root 6 Nov 5 2016 lib64
    13. drwxr-xr-x. 2 root root 6 Nov 5 2016 libexec
    14. drwxr-xr-x. 2 root root 6 Nov 5 2016 sbin
    15. drwxr-xr-x. 5 root root 49 Mar 28 01:26 share
    16. drwxr-xr-x. 2 root root 6 Nov 5 2016 src

     

  • 把hadoop压缩包解压到 /hadoop文件下
     
    1. [root@localhost local]# tar -zxvf /home/dr/Datafile/hadoop-2.7.7.tar.gz -C /usr/local/hadoop/
    2. hadoop-2.7.7/
    3. hadoop-2.7.7/bin/
    4. hadoop-2.7.7/bin/hadoop.cmd
    5. hadoop-2.7.7/bin/rcc
    6. hadoop-2.7.7/bin/test-container-executor
    7. hadoop-2.7.7/bin/mapred
    8. hadoop-2.7.7/bin/yarn
    9. hadoop-2.7.7/bin/yarn.cmd
    10. hadoop-2.7.7/bin/hadoop

    -C 才可以成功解压到指定目录

2.将Hadoop添加到环境变量

  • 获取Hadoop安装路径
     
    1. [root@localhost hadoop-2.7.7]# pwd
    2. /usr/local/hadoop/hadoop-2.7.7

     

  • 编辑 /etc/profile 文件,把环境配置加在最后
     
    1. export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.7
    2. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

     

  • 让修改后的文件生效,并测试Hadoop是否安装成功

     
    1. [root@localhost hadoop-2.7.7]# source /etc/profile
    2. [root@localhost hadoop-2.7.7]# hadoop version
    3. Hadoop 2.7.7
    4. Subversion Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac
    5. Compiled by stevel on 2018-07-18T22:47Z
    6. Compiled with protoc 2.5.0
    7. From source with checksum 792e15d20b12c74bd6f19a1fb886490
    8. This command was run using /usr/local/hadoop/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar

     

3.Hadoop目录结构

  1. [root@localhost hadoop-2.7.7]# ll
  2. total 112
  3. drwxr-xr-x. 2 dr ftp 194 Jul 18 2018 bin # 存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本
  4. drwxr-xr-x. 3 dr ftp 20 Jul 18 2018 etc # Hadoop的配置文件目录,存放Hadoop的配置文件
  5. drwxr-xr-x. 2 dr ftp 106 Jul 18 2018 include
  6. drwxr-xr-x. 3 dr ftp 20 Jul 18 2018 lib # 存放Hadoop的本地库(对数据进行压缩解压缩功能)
  7. drwxr-xr-x. 2 dr ftp 239 Jul 18 2018 libexec
  8. -rw-r--r--. 1 dr ftp 86424 Jul 18 2018 LICENSE.txt
  9. -rw-r--r--. 1 dr ftp 14978 Jul 18 2018 NOTICE.txt
  10. -rw-r--r--. 1 dr ftp 1366 Jul 18 2018 README.txt
  11. drwxr-xr-x. 2 dr ftp 4096 Jul 18 2018 sbin
  12. drwxr-xr-x. 4 dr ftp 31 Jul 18 2018 share # 存放Hadoop的依赖jar包、文档、和官方案例

 

三、Hadoop伪分布式配置(重点)

1.说明配置文件的一些注意事项

在Hadoop安装目录/etc/下创建一个 hadoop 文件夹,如果没有特别说明,所以下面所有配置文件都要进这个 /usr/local/hadoop/hadoop-2.7.7/etc/hadoop

2.配置 hadoop-env.sh(在hadoop-2.7.7/etc/hadoop/ 下

[root@localhost hadoop]# vi hadoop-env.sh

为了方便查找我们要改的信息,在命令行模式,输入 :se nu 显示行号

然后找到25行,跟33行改成所对应的jdk目录,跟hadoop目录(注意这里是hadoo目录后面还有/ect/hadoop)

  1. # 修改前
  2. 24 # The java implementation to use.
  3. 25 export JAVA_HOME=${JAVA_HOME}
  4. 26
  5. 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
  6. 28 # that bind to privileged ports to provide authentication of data transfer
  7. 29 # protocol. Jsvc is not required if SASL is configured for authentication of
  8. 30 # data transfer protocol using non-privileged ports.
  9. 31 #export JSVC_HOME=${JSVC_HOME}
  10. 32
  11. 33 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
  12. #修改后
  13. 24 # The java implementation to use.
  14. 25 export JAVA_HOME=/usr/local/java/jdk1.8.0_171
  15. 26
  16. 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
  17. 28 # that bind to privileged ports to provide authentication of data transfer
  18. 29 # protocol. Jsvc is not required if SASL is configured for authentication of
  19. 30 # data transfer protocol using non-privileged ports.
  20. 31 #export JSVC_HOME=${JSVC_HOME}
  21. 32
  22. 33 export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.7.7/etc/hadoop

 保存并退出 ESC :wq! ,使配置文件立即生效

3.配置四个文件 core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml

首先在hadoop目录下: /usr/local/hadoop/hadoop-2.7.7 ,创建一个新的 tmp 文件夹

编辑  core-site.xml

  1. [root@localhost hadoop]# vi core-site.xml
  2. <configuration>
  3. <property>
  4. <name>fs.default.name</name>
  5. <value>hdfs://master:9000</value> # master 是你自己的主机名
  6. </property>
  7. <property>
  8. <name>hadoop.tmp.dir</name>
  9. <value>/usr/local/hadoop/hadoop-2.7.7/tmp</value> # 改成你自己的文件地址
  10. </property>
  11. </configuration>

 编辑 hdfs-site.xml

  1. vim hdfs-site.xml
  2. # 照抄
  3. <configuration>
  4. <property>
  5. <name>dfs.replication</name>
  6.  <value>1</value>
  7. </property>
  8. <!--设置hdfs的操作权限,false表示任何用户都可以在hdfs上操作文件-->
  9. <property>
  10.   <name>dfs.permissions</name>
  11.   <value>false</value>
  12.  </property>
  13. </configuration>

 编辑 mapred-site.xml

这个文件初始时是没有的,但是有一个模板文件,mapred-site.xml.template

所以需要拷贝一份,并重命名为mapred-site.xml使用命令:

cp ./mapred-site.xml.template ./mapred-site.xml

 然后编辑打开 mapred-site.xml

  1. vim mapred-site.xml
  2. # 直接复制
  3. <configuration>
  4. <property>
  5.   <name>mapreduce.framework.name</name>
  6.   <value>yarn</value>
  7.   </property>
  8. </configuration>

 编辑 yarn-site.xml

  1. vim yarn-site.xml
  2. # 直接复制, 修改 master
  3. <configuration>
  4. <property>
  5.    <!--指定yarn的老大 resoucemanager的地址-->
  6.    <name>yarn.resourcemanager.hostname</name>:
  7.    <value>master</value>
  8. </property>       
  9. <property>
  10. <!--NodeManager获取数据的方式-->
  11.   <name>yarn.nodemanager.aux-services</name>
  12.   <value>mapreduce_shuffle</value>
  13. </property>
  14. </configuration>

 编辑slaves文件

  1. vim slaves
  2. # 修改 localhost 为 master

四、查看及启动

 用命令 jps 查看在线的工作节点

  1. [root@localhost hadoop]# jps
  2. 50451 Jps
  3. # 我们可以看到没有启动hadoop,是没有任何节点在线的

格式化namenode

 在第一次安装hadoop的时候,需要对namenode进行格式化,以后请不要随便在去用这个命令格式化namenode

  1. [root@localhost hadoop]# hadoop namenode -format
  2. # 如果成功会有 有 successfully 提示
  3. 21/03/29 02:43:46 INFO common.Storage: Storage directory /usr/local/hadoop/hadoop-2.7.7/tmp/dfs/name has been successfully formatted.
  4. 21/03/29 02:43:46 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/hadoop-2.7.7/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
  5. 21/03/29 02:43:46 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/hadoop-2.7.7/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
  6. 21/03/29 02:43:46 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
  7. 21/03/29 02:43:46 INFO util.ExitUtil: Exiting with status 0
  8. 21/03/29 02:43:46 INFO namenode.NameNode: SHUTDOWN_MSG:
  9. /************************************************************
  10. SHUTDOWN_MSG: Shutting down NameNode at master/192.168.23.128
  11. ************************************************************/

 启动hadoop

因为我们已经配置hadoop的环境变量了,所以不要在sbin 目录下启动,在任何目录下直接用这个命令都能启动hadoop

直接使用这个命令

  1. start-dfs.sh
  2. [root@localhost hadoop]# jps
  3. 50788 DataNode
  4. 51093 Jps
  5. 50649 NameNode
  6. 50970 SecondaryNameNode
  7. # 出现上面这些说明hadoop伪分布式配置好了。

 打开浏览器输入 http://master:50070 查看能成功打开 namenode状态信息

五、实例测试

既然已经装好了hadoop了,那么我们来试试官方给出的wordcount实例操作一下,感受hadoop的强大吧。

1.创建一个本地文件1.txt:

目录我选择为:/home/dr/test.txt

输入命令:

  1. vim test.txt
  2. # 输入下面内容
  3. i like hadoop
  4. and i like study
  5. i like java
  6. i like jdk
  7. i like java jdk hadoop
  8. # 退出保存

2.上传到hdfs文件系统上

 我的test.txt是在/home/dr/下

首先先在hdfs的根目录下创建一个input目录,使用命令:

hdfs dfs -mkdir /input

然后上传到hdfs上:(确保你当前的路径在/home/dr下)

hdfs dfs -put ./test.txt  /input

 然后查看是否成功上传:

hdfs dfs -ls /input

 3.使用命令让hadoop工作

在上面的启动hadoop中,我们只启动hdfs,没有启动yarn,因此我们先使用命令启动yarn:

start-yarn.sh

 然后直接使用命令:

  1. # 因hadoopb版本不一致,因此解压缩后 share 文件夹内的 jar 包名版本会有些不一致
  2. hadoop jar ./hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input /output

我们看到hadoop已经跑起来了,最后的successful代表工作成功! 

4.查看输出结果

hadoop工作成功后会自己在指定路径/output 生成两个文件
我们看一下

 hdfs dfs -ls /output

第一个文件/output/_SUCCESS:是表示工作成功的文件,没有具体的文本,这个我们忽略

第二个文件/output/part-r-00000:才是我们真正的输出文件

查看一下结果:

hdfs dfs -cat /output/part-r-00000

最后得出了每个单词出现的次数 


本文借鉴了 Centos Linux上安装hadoop伪分布式详细过程

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/611307
推荐阅读
相关标签
  

闽ICP备14008679号