赞
踩
如果你和我一样需要
下载Hadoop安装包到本地,建议选择较新的版本,如果你需要做集群热升级的话。我选择的是3.2.4版本。
wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.4/hadoop-3.2.4-src.tar.gz
下载Zookeeper安装包到本地,选择最新的稳定版本即可。我选择的是3.7.1版本。
wget https://dlcdn.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
下载jdk安装包到本地,Hadoop需要特定范围版本的jdk,选择8u151。
wget https://repo.huaweicloud.com/java/jdk/8u151-b12/jdk-8u151-linux-x64.tar.gz
准备好docker环境,此处略
下载centos7的镜像到本地
docker pull centos:7
制作构建Docker镜像的文件Dockerfile,目的是安装配置SSH服务,以便在容器中允许SSH连接
vi Dockerfile
文件内容如下:
FROM centos:7
RUN yum install -y openssh-server sudo
RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
RUN yum install -y openssh-clients
RUN echo "root:111111" | chpasswd
RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN mkdir /var/run/sshd
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
基于上述的Dockerfile制作了新镜像centos7-ssh
docker build -t="centos7-ssh" .
再次编辑Dockerfile,这次我们要把刚才下载的jdk、Hadoop、zookeeper加入进来,制作一个新镜像
vi Dockerfile
文件内容如下:
FROM centos7-ssh
ADD jdk-8u151-linux-x64.tar.gz /usr/local/
RUN mv /usr/local/jdk1.8.0_151 /usr/local/jdk1.8
ENV JAVA_HOME /usr/local/jdk1.8
ADD hadoop-3.2.4.tar.gz /usr/local
RUN mv /usr/local/hadoop-3.2.4 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ADD apache-zookeeper-3.8.2-bin.tar.gz /usr/local
RUN mv /usr/local/apache-zookeeper-3.8.2-bin /usr/local/zookeeper
ENV ZOOKEEPER_HOME /usr/local/zookeeper
ENV PATH $JAVA_HOME/bin:$HADOOP_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
RUN yum install -y which sudo
制作新镜像就叫hadoop-3.2.4
docker build -t="hadoop-3.2.4" .
基于镜像hadoop-3.2.4运行三个容器,分别叫hadoop1,hadoop2,hadoop3
docker run --name hadoop1 --hostname hadoop1 -d -P -p 50070:50070 -p 8088:8088 hadoop-3.2.4
docker run --name hadoop2 --hostname hadoop2 -d -P hadoop-3.2.4
docker run --name hadoop3 --hostname hadoop3 -d -P hadoop-3.2.4
输入docker ps查看这几个容器跑起来没
后续操作基本都会在容器内部进行,如果想要进入一个容器,以hadoop1为例,命令是
docker exec -it hadoop1 bash
进入hadoop1,在每个容器内设置无密码登录
首先查看hadoop1-hadoop3的ip地址,注意这里的ip地址是docker容器在创建时自动设置的,会随着容器的重启自动更新,如果你需要固定ip,请自行搜索docker容器设置静态ip相关操作
cat /etc/hosts
分别在三个容器内查询ip,在我这hadoop1-hadoop3分别对应了172.17.0.7-172.17.0.9
将映射关系写入到hadoop1的/etc/hosts中
vi /etc/hosts
依次执行下面的命令
ssh-keygen(执行后会有多个输入提示,不用输入任何内容,全部直接回车即可)
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop1 (分别输入yes,111111)
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop2
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop3
现在可以直接ssh到另外一个节点而不用输入密码了,测试一下
自行在hadoop2和hadoop3中重复上述操作,使得每个节点之间都能免密码ssh连接。
在每个容器内配置环境变量
vi /etc/profile
文件内容如下
export PATH=$PATH:$HADOOP_HOME/sbin
使配置生效
source /etc/profile
创建几个文件夹用于存储HDFS的节点信息
mkdir /usr/local/hdfs
mkdir /usr/local/hdfs/name
mkdir /usr/local/hdfs/data
mkdir /usr/local/hdfs/journaldata
Hadoop配置文件目录在/usr/local/hadoop/etc/hadoop/下
cd /usr/local/hadoop/etc/hadoop/
依次用vi修改下列文件
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://cluster</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value> </property> <property> <name>hadoop.zk.address</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value> </property> <property> <name>ha.zookeeper.parent-znode</name> <value>/hadoop-ha</value> </property> </configuration>
hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_SHELL_EXECNAME=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.nameservices</name> <value>cluster</value> </property> <property> <name>dfs.ha.namenodes.cluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.cluster.nn1</name> <value>hadoop1:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster.nn1</name> <value>hadoop1:50070</value> </property> <property> <name>dfs.namenode.rpc-address.cluster.nn2</name> <value>hadoop2:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster.nn2</name> <value>hadoop2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/cluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hdfs/journaldata</value> </property> <property> <name>ipc.client.connect.max.retries</name> <value>30</value>ide </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.client.failover.proxy.provider.cluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>3000</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop1:19888</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>8192</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>8192</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>8</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>xzk</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop2</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
workers
hadoop1
hadoop2
hadoop3
将修改后的文件分发给另外两个节点
scp * hadoop2:/usr/local/hadoop/etc/hadoop/
scp * hadoop3:/usr/local/hadoop/etc/hadoop/
以上是Hadoop的配置,下面配置zookeeper,进入目录/usr/local/zookeeper
cd conf
vi zoo.cfg
文件内容为
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper/zkdata
dataLogDir=/usr/local/zookeeper/zklog
server.1=hadoop1:2888:3888;2181
server.2=hadoop2:2888:3888;2181
server.3=hadoop3:2888:3888;2181
clientPort=2181
将文件分发到另外两个节点
在每个节点建立两个文件夹记录文件和日志
mkdir zkdata zklog
进入zkdata,建立一个名为myid的文件,向hadoop1的myid写入1,hadoop2的myid写入2,hadoop3的myid写入3,如有更多节点,以此类推
cd zkdata
vi myid
首先在每个节点启动zookeeper,命令为
zkServer.sh start
启动完成后可以查看当前节点的zookeeper状态。有一个节点是leader,其它都是follower
zkServer.sh status
在每台节点上启动journalnode服务
hdfs --daemon start journalnode
在hadoop1上格式化namenode
hdfs namenode -format
在hadoop1上格式化zkfc
hdfs zkfc -formatZK
启动集群
start-all.sh
在每个节点上观察一下进程情况
在/usr/local/hadoop/share/hadoop/mapreduce/中有很多测试程序
cd /usr/local/hadoop/share/hadoop/mapreduce/
测试写
hadoop jar hadoop-mapreduce-client-jobclient-3.2.4-tests.jar TestDFSIO -write -nrfiles 10 -size 1GB
测试读
hadoop jar hadoop-mapreduce-client-jobclient-3.2.4-tests.jar TestDFSIO -read -nrfiles 10 -size 1GB
注意必须先写后读,且大小要一致
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。