赞
踩
hadoop2.9.1 + hive2.3.4 + kafka0.10.2.2 + spark2.2.2 + zookeeper3.4.9安装
环境:centos7、java1.8、scala2.11.12
安装虚拟机:centos7:CentOS-7-x86_64-Minimal-1611.iso
安装五台:
192.168.0.105 hadoop01
192.168.0.106 hadoop02
192.168.0.107 hadoop03
192.168.0.108 hadoop04
192.168.0.109 hadoop05
角色规划:
hadoop01:active namenode,ZKFC,kafka,master,ResourceManager,JobHistoryServer
hadoop02:standby namenode,ZKFC,kafka,ResourceManager,worker
hadoop03:datanode,journal node,QuorumPeerMain,NodeManager,Kafka,Worker
hadoop04:datanode,journal node,QuorumPeerMain,NodeManager,Kafka,Worker
hadoop05:datanode,journal node,QuorumPeerMain,NodeManager,Kafka,Worker
虚拟机安装好之后reboot
配置网络:
vim /etc/sysconfig/network-scripts/ifcfg-e***
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=b7e7186c-6ef6-44d6-8855-701153e5827d
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.0.105
PREFIX=24
GATEWAY=192.168.0.1
DNS1=144.144.144.144
DNS2=8.8.8.8
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
重启网络
service network restart
配置DNS
检查NetManager的状态:systemctl status NetworkManager.service
检查NetManager管理的网络接口:nmcli dev status
检查NetManager管理的网络连接:nmcli connection show
设置dns:nmcli con mod ens33 ipv4.dns "114.114.114.114 8.8.8.8"
让dns配置生效:nmcli con up ens33
配置hosts
vi /etc/hosts
192.168.0.105 hadoop01
192.168.0.106 hadoop02
192.168.0.107 hadoop03
192.168.0.108 hadoop04
192.168.0.109 hadoop05
关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
关闭Windows的防火墙
使用CRT或者Xshell连接虚拟机
配置yum
yum clean all
yum makecache
yum install -y wget
还可以安装一些自己喜好的工具
所有安装路径为:/usr/local/
安装Java
解压jdk
配置环境变量:
vim ~/.bashrc
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$JAVA_HOME/bin
source ~/.bashrc
检测是否安装成功:java -version
此时安装另外4台虚拟机,过程一样的。
安装完成后,配置ssh免密登录:
ssh-keygen -t rsa (在主节点执行,一路回车)
ssh-copy-id 192.168.0.106 (cope到从节点)
在hadoop03、hadoop04、hadoop05三台机器上部署zookeeper
tar -zxvf zookeeper-3.4.9.tar.gz
mv zookeeper-3.4.9 zookeeper
vi ~/.bashrc
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
source ~/.bashrc
vi zoo.cfg ($ZOOKEEPER_HOME/conf)
dataDir=/home/data/zookeeper
dataLogDir=/home/log/zookeeper
server.1=hadoop03:2888:3888
server.2=hadoop04:2888:3888
server.3=hadoop05:2888:3888
mkdir -p /home/data/zookeeper
mkdir -p /home/log/zookeeper
cd /home/data/zookeeper
echo 1 > myid
scp -r /usr/local/zookeeper hadoop04:/usr/local ,修改环境变量,在hadoop04将myid的内容改为2 (echo 2 > myid)
scp -r /usr/local/zookeeper hadoop05:/usr/local ,修改环境变量,在hadoop05将myid的内容改为3 (echo 3 > myid)
三台机器上执行:zkServer.sh start
zkServer.sh status(查看集群状态,主从信息)
部署hadoop集群:
首先安装hadoop2.9.1(HA)
下载安装包:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz
解压安装包
改名字为 hadoop
cd /usr/local/hadoop/etc/hadoop
在hadoop01~05都创建目录:mkdir -p /home/apps/hadoop/tmp
在hadoop01~05上都创建目录:mkdir -p /home/apps/hadoop/journaldata
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/apps/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop03:2181,hadoop04:2181,hadoop05:2181</value>
</property>
<!-- 最大连接次数,修改大一点,为20,默认10 -->
<property>
<name>ipc.client.connect.max.retries</name>
<value>20</value>
<description>Indicates the number of retries a client will make to establish
a server connection.
</description>
</property>
<!-- 重复连接的间隔1000ms=1s -->
<property>
<name>ipc.client.connect.retry.interval</name>
<value>1000</value>
<description>Indicates the number of milliseconds a client will wait for
before retrying to establish a server connection.
</description>
</property>
</configuration>
vim hadoop-env.sh
export JAVA_HOME=/usr/java/latest
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hadoop01:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hadoop01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hadoop02:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hadoop02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop03:8485;hadoop04:8485;hadoop05:8485/ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/apps/hadoop/journaldata</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id 两个namenode的clusterID不能一样-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<!-- 指定zookeeper集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop03:2181,hadoop04:2181,hadoop05:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
vim mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 指定jobhistory server的http地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
<!-- 开启uber模式(针对小作业的优化) -->
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<!-- 配置启动uber模式的最大map数 -->
<property>
<name>mapreduce.job.ubertask.maxmaps</name>
<value>9</value>
</property>
<!-- 配置启动uber模式的最大reduce数 -->
<property>
<name>mapreduce.job.ubertask.maxreduces</name>
<value>1</value>
</property>
</configuration>
vim slaves
hadoop03
hadoop04
hadoop05
将这个hadoop拷贝到hadoop02~05台机器上去:scp -r /usr/local/hadoop hadoop02:/usr/local/,修改环境变量
在hadoop03、hadoop04、hadoop05上启动journal nodes集群:
sbin/hadoop-daemon.sh start journalnode
格式化hadoop01:bin/hdfs namenode -format
启动刚格式化好的namenode:sbin/hadoop-deamon.sh start namenode
在第二台机器hadoop02上同步namenode的数据:bin/hdfs namenode -bootstrapStandby
启动第二台hadoop02的namenode:sbin/hadoop-deamon.sh start namenode
格式化ZKFC(hadoop01上执行)
hdfs zkfc -formatZK
启动hdfs集群(hadoop01上执行)
sbin/start-dfs.sh,这个脚本会自动在hadoop01和hadoop02上启动一个namenode,同时启动一个ZKFC,然后会自动在hadoop03、hadoop04、hadoop05上分别启动一个datanode
然后手动切换namenode状态
手动切换namenode状态(也可以在第一台切换第二台为active,毕竟一个集群)
bin/hdfs haadmin -transitionToActive nn1 ##切换成active
bin/hdfs haadmin -transitionToStandby nn1 ##切换成standby
注: 如果不让你切换的时候,bin/hdfs haadmin -transitionToActive nn2 --forceactive
也可以直接通过命令行查看namenode状态, bin/hdfs haadmin -getServiceState nn1
查看active 和standby命令
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
没问题后启动yarn:
hadoop01:start-yarn.sh
hadoop02:yarn-daemon.sh start resourcemanager
注意!!!
如果反复格式化namenode,需要将集群HDFS关闭,journal node的三个节点stop后再删除一下文件:
/usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
rm -rf /home/apps/hadoop/tmp/*
rm -rf /home/apps/hadoop/journaldata/*
然后再在hadoop03、hadoop04、hadoop05、启动journalnode
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
致辞,hadoopHA集群安装完成。
centos7 安装MySQL:
https://www.cnblogs.com/yowamushi/p/8043054.html
使用yum安装mysql server。
yum install -y mysql-server
service mysqld start
chkconfig mysqld on
使用yum安装mysql connector
yum install -y mysql-connector-java
将mysql connector拷贝到hive的lib包中
cp /usr/share/java/mysql-connector-java-5.1.17.jar /usr/local/hive/lib 版本问题: D:\down\mysql-connector-java-5.1.46
改MySQL密码:set password for root@localhost = password('123456');
在mysql上创建hive元数据库,并对hive进行授权
create database if not exists hive_metadata;
grant all privileges on hive_metadata.* to 'root'@'%' identified by '123456';
grant all privileges on hive_metadata.* to 'root'@'localhost' identified by '123456';
grant all privileges on hive_metadata.* to 'root'@'hadoop011' identified by '123456';
flush privileges;
use hive_metadata;
安装hive2.3.4:https://blog.csdn.net/pengjunlee/article/details/81607890
Hive配置项的含义详解:https://blog.csdn.net/aaa1117a8w5s6d/article/details/16884401
配置环境变量:
vim /etc/profile
以上文章写的很详细。
hive是单节点安装就行,不需要拷贝到其他节点。
注意一个大坑:hive的元数据存储在MySQL后,需要将MySQL的包拷贝到hive 的lib目录下,要注意版本,我用的是mysql-connector-java-5.1.46-bin.jar
安装Scala:2.11.12
解压
配置环境变量
检测是否安装成功:scala -version
将scala目录拷贝到其他节点,并配置环境变量
安装kafka:kafka_2.11-0.10.2.2.tgz
解压
改名为kafka
vim /usr/local/kafka/config/server.properties
zookeeper.connect=192.168.0.107:2181,192.168.0.108:2181,192.168.0.109:2181
server.properties中的broker.id,要设置为1、2、3、4(拷贝到其他机器的时候要改broker.id,不能重复)
安装slf4j
将slf4j-1.7.6.zip上传到/usr/local目录下
unzip slf4j-1.7.6.zip
把slf4j中的slf4j-nop-1.7.6.jar复制到kafka的libs目录下面
拷贝kafka目录到其他机器,注意更改broker.id
在5台机器上分别执行以下命令:nohup bin/kafka-server-start.sh config/server.properties &
使用基本命令检查kafka是否搭建成功
bin/kafka-topics.sh --zookeeper 192.168.0.107:2181,192.168.0.108:2181,192.168.0.109:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create
bin/kafka-console-producer.sh --broker-list 192.168.0.107:9092,192.168.0.108:9092,192.168.0.109:9092 --topic TestTopic
bin/kafka-console-consumer.sh --zookeeper 192.168.0.107:2181,192.168.0.108:2181,192.168.0.109:2181 --topic TestTopic --from-beginning
测试成功。
安装spark2.2.2
解压
vi spark-env.sh
export JAVA_HOME=/usr/java/latest
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER_IP=192.168.0.105
export SPARK_WORKER_MEMORY=2g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
vim slaves(犹豫hadoop01 为active,并且spark占用内存加较多暂时不设置为spark worker节点)
hadoop02
hadoop03
hadoop04
hadoop05
启动spark:
sbin/start-all.sh
8080端口监听
查看spark-shell、spark-sql是否正常。
环境变量:
export JAVA_HOME=/usr/java/latest
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_HOME=/usr/local/spark
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
多谢!
启动集群:
hadoop03:
/usr/local/zookeeper/bin/zkServer.sh start
hadoop04:
/usr/local/zookeeper/bin/zkServer.sh start
hadoop05:
/usr/local/zookeeper/bin/zkServer.sh start
hadoop01:
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh
hadoop02:
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
hadoop01:
/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
/usr/local/kafka/bin/kafka-server-start.sh
/usr/local/spark/sbin/start-all.sh
关闭集群:顺序很重要!!!
hadoop01:
/usr/local/kafka/bin/kafka-server-stop.sh
/usr/local/spark/sbin/stop-all.sh
/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver
/usr/local/hadoop/sbin/stop-yarn.sh
hadoop02:
/usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
hadoop01:
/usr/local/hadoop/sbin/stop-dfs.sh
hadoop03:
/usr/local/zookeeper/bin/zkServer.sh stop
hadoop04:
/usr/local/zookeeper/bin/zkServer.sh stop
hadoop05:
/usr/local/zookeeper/bin/zkServer.sh stop
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。