赞
踩
Hadoop是Apache旗下的一个用java语言实现开源软件框架,是一个开发和运行处理大规模数据的软件平台。允许使用简单的编程模型在大量计算机集群上对大型数据集进行分布式处理。在搭建之前请一定要确保Hadoop集群搭建的前置准备已经完成,详细内容可以参考我的这篇文章。Hadoop安装包链接:https://pan.baidu.com/s/12R1q8ygEnosP9pVbX5rvxg
提取码:LZZY
http://t.csdn.cn/FzkEShttp://t.csdn.cn/FzkES
1、上传Hadoop的压缩包到我们的CentOS7系统上去,可以将压缩包直接拖入系统根目录,如图所示。(安装包在百度网盘,需要的小伙伴可以自行获取)。还可以通过我们的wgt命令去从官网下载,不过使用wgt命令下载的话,下载速度是非常慢的,使用小编还是建议大家采取第一种方案去获取我们的Hadoop的压缩包)使用wgt的完整命令在这里:wget http: /archive.apache.org/dist/hadoop/common/hadoop3.1.3/hadoop-3.1.3.tar.g
2、然后我们就要解压我们的Hadoop压缩包,使用命令:tar -zxvf hadoop-3.1.3.tar.gz -C /export/server/ 将Hadoop压缩包解压至/export/server目录下。正常解压好后是如下图所示,但是有可能会有解压不了的情况,可能是在Hadoop压缩包上传的时候出现了问题。可以虚拟机上的Hadoop的压缩包,在重新上传即可。
3、还是跟之前安装JDK一样,为了后续操作的方便去给Hadoop创建一个软连接。
ln -s /export/server/hadoop-3.1.3 /export/server/hadoop
1、首先进入Hadoop的文件夹中,
cd /export/server/hadoop/etc/Hadoop
可以看见其中有个叫etc的文件,他存放的就是我们的配置文件
2、修改配置文件hadoop-env.sh
使用命令:vim hadoop-env.sh 在其开头加上下段代码。
- # 配置Java安装路径
- export JAVA_HOME=/export/server/jdk
- #配置Hadoop安装路径
- export HADOOP_HOME=/export/server/hadoop
- #Hadoop hdfs配置文件路径
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- #Hadoop YARN配置文件路径
- export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
- # Hadoop YARN 日志文件夹
- export YARN_LOG_DIR=$HADOOP_HOME/1ogs/yarn
- # Hadoop hdfs 日志文件夹
- export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs
-
- # Hadoop的使用启动用户配置
- export HDFS_NAMENODE_USER=root
- export HDFS_DATANODE_USER=root
- export HDFS_SECONDARYNAMENODE_USER=root
- export YARN_RESOURCEMANAGER_USER=root
- export YARN_NODEMANAGER_USER=root
- export YARN_PROXYSERVER_USER=root
插入好后入下图所示,然后保存退出。(vim编辑器保存方式是在末行模式下点击esc键进入命令模式,然后输入:在输入wq这里表示推出,q表示保存。后面我会在单独出发布一篇vim编辑器的使用方法以及常用命令)
3、修改配置文件core-site.xml 将下面代码替代core-site.xml里的所有代码
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
-
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://master:8020</value>
- </property>
-
- <property>
- <name>io.file.buffer.size</name>
- <value>131072</value>
- <description></description>
- </property>
- </configuration>
4、修改配置文件hdfs-site.xml将下面代码替代hdfs-site.xml里的所有代码
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
-
-
- <configuration>
- <property>
- <name>dfs.datanode.data.dir.perm</name>
- <value>700</value>
- </property>
-
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>/data/nn</value>
- <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
- </property>
-
- <property>
- <name>dfs.namenode.hosts</name>
- <value>node1,node2,node3</value>
- <description>List of permitted DataNodes.</description>
- </property>
-
- <property>
- <name>dfs.blocksize</name>
- <value>268435456</value>
- <description></description>
- </property>
-
- <property>
- <name>dfs.namenode.handler.countL</name>
- <value>100</value>
- <description></description>
- </property>
-
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>/data/dn</value>
- </property>
- </configuration>
5、修改配置文件mapred-env.sh 将下面代码加入mapred-env.sh的开头
- export JAVA_HOME=/export/server/jdk
- export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
- export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
6 、修改配置文件mapred-site.xml将下面代码替换mapred-site.xml的所有代码
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--Licensed under the Apache License, Version 2.0
- (the "License");
- you may not use this file except in compliance
- with the License.
- You may obtain a copy of the License at
- http: /www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in
- writing, software
- distributed under the License is distributed on
- an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
- either express or implied.
- See the License for the specific language
- governing permissions and
- limitations under the License. See accompanying
- LICENSE file.
- -->
- <!-- Put site-specific property overrides in this
- file. >
-
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- <description></description>
- </property>
-
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>node1:10020</value>
- <description></description>
- </property>
-
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>node1:19888</value>
- <description></description>
- </property>
-
- <property>
- <name>mapreduce.jobhistory.intermediate-done-dir</name>
- <value>/data/mr-history/tmp</value>
- <description></description>
- </property>
-
- <property>
- <name>mapreduce.jobhistory.done-dir</name>
- <value>/data/mr-history/done</value>
- <description></description>
- </property>
-
- <property>
- <name>yarn.app.mapreduce.am.env</name>
- <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
- </property>
-
- <property>
- <name>mapreduce.map.env</name>
- <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
- </property>
-
- <property>
- <name>mapreduce.reduce.env</name>
- <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
- </property>
- </configuration>
7、修改配置文件yarn-env.sh将下面代码替换yarn-env.sh的所有代码
- export JAVA_HOME=/export/server/jdk
- export HADOOP_HOME=/export/server/hadoop
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
- export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs
8、修改配置文件yarn-site.xmll将下面代码替换yarn-site.xml的所有代码
- <?xml version="1.0"?>
- <!--
- Licensed under the Apache License, Version 2.0
- (the "License");
- you may not use this file except in compliance
- with the License.
- You may obtain a copy of the License at
- http: /www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in
- writing, software
- distributed under the License is distributed on
- an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
- either express or implied. See the License for the specific language
- governing permissions and
- limitations under the License. See accompanying
- LICENSE file.
- -->
- <configuration>
-
- <!--- Site specific YARN configuration properties -->
- <property>
- <name>yarn.log.server.url</name>
- <value>http: /node1:19888/jobhistory/logs</value>
- <description></description>
- </property>
-
- <property>
- <name>yarn.web-proxy.address</name>
- <value>node1:8089</value>
- <description>proxy server hostname and port</description>
- </property>
-
- <property>
- <name>yarn.log-aggregation-enable</name>
- <value>true</value>
- <description>Configuration to enable or disable log aggregation</description>
- </property>
-
- <property>
- <name>yarn.nodemanager.remote-app-logdir</name>
- <value>/tmp/logs</value>
- <description>Configuration to enable or disable log aggregation</description>
- </property>
- <!--- Site specific YARN configuration properties -->
- <property>
- <name>yarn.resourcemanager.hostname</name>
- <value>node1</value>
- <description></description>
- </property>
-
- <property>
- <name>yarn.resourcemanager.scheduler.class</name>
- <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
- </value>
- <description></description>
- </property>
-
- <property>
- <name>yarn.nodemanager.local-dirs</name>
- <value>/data/nm-local</value>
- <description>Comma-separated list of paths on the local filesystem where intermediate data is written.</description>
- </property>
-
- <property>
- <name>yarn.nodemanager.log-dirs</name>
- <value>/data/nm-log</value>
- <description>Comma-separated list of paths on the local filesystem where logs are written.</description>
- </property>
-
- <property>
- <name>yarn.nodemanager.log.retainseconds</name>
- <value>10800</value>
- <description>Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.</description>
- </property>
-
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- <description>Shuffle service that needs to be set for Map Reduce applications.
- </description>
- </property>
- </configuration>
9、在配置文件workers插入如下代码
- node1
- node2
- node3
1、将Hadoop分发给其他虚拟机,此过程只用在node1中操作即可。使用命令:cd /export/server进入我们Hadoop安装目录
分发给node2
scp -r hadoop-3.1.3 node2:`pwd`/
分发给node3
scp -r hadoop-3.1.3 node3:`pwd`/
2、分发好后,还是同样操作在node2,node3中创建Hadoop软链接
ln -s /export/server/hadoop-3.1.3 /export/server/hadoop
3、创建工作目录。
在node1中分别创建以下目录
- mkdir -p /data/nn
- mkdir -p /data/dn
- mkdir -p /data/nm-log
- mkdir -p /data/nm-local
在node2中分别创建以下目录
- mkdir -p /data/dn
- mkdir -p /data/nm-log
- mkdir -p /data/nm-local
在node3中分别创建以下目录
- mkdir -p /data/dn
- mkdir -p /data/nm-log
- mkdir -p /data/nm-local
4、配置环境变量
在node1、node2、node3修改/etc/profile 将下面代码复制到/etc/profile文件最下面
- export HADOOP_HOME=/export/server/hadoop
- export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
注意,需要在三台虚拟机中都要执行此操作。保存退出后还要执行命令:source /etc/profile使其生效
4、格式化NameNode
在node1中操作即可,使用命令: hadoop namenode -formathadoop 这个命令来自于:$HADOOP_HOME/bin中的程序 由于配置了环境变量PATH,所以可以在任意位置执行hadoop命令哦
1、启动hadoop的hdfs集群,在node1执行即可
- start-dfs.sh
- # 如需停止可以执行
- stop-dfs.sh
2、启动hadoop的yarn集群,在node1执行即可
- start-yarn.sh
- # 如需停止可以执行
- stop-yarn.sh
3、启动历史服务器
- mapred -daemon start historyserver
- # 停止命令
- mapred -daemon stop historyserver
注意:如果ips后发现没有启动历史服务器可以进入hadoop目录下的sbin输入如下命令(mr-jobhistory-daemon.sh start historyserver)
4、启动web代理服务器
- yarn-daemon.sh start proxyserver
- #停止命令
- yarn-daemon.sh stop proxyserver
到这里我们的Hadoop集群就已经搭建完毕了,感谢您阅读我的博客!希望本文能够给您带来新的思考和启发。如果您对本文中的观点有任何疑问或者想要深入讨论的话题,请随时在评论区留言,我会尽力回答。期待与您共同探索更多有趣的话题,下次见!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。