当前位置:   article > 正文

hadoop2.9.1 + hive2.3.4 + kafka0.10.2.2 + spark2.2.2 + zookeeper3.4.9安装_hadoop 2.9.1对应的slf4j的jar

hadoop 2.9.1对应的slf4j的jar

hadoop2.9.1 + hive2.3.4 + kafka0.10.2.2 + spark2.2.2 + zookeeper3.4.9安装
环境:centos7、java1.8、scala2.11.12

安装虚拟机:centos7:CentOS-7-x86_64-Minimal-1611.iso
安装五台:
192.168.0.105 hadoop01
192.168.0.106 hadoop02
192.168.0.107 hadoop03
192.168.0.108 hadoop04
192.168.0.109 hadoop05

角色规划:
hadoop01:active namenode,ZKFC,kafka,master,ResourceManager,JobHistoryServer
hadoop02:standby namenode,ZKFC,kafka,ResourceManager,worker
hadoop03:datanode,journal node,QuorumPeerMain,NodeManager,Kafka,Worker
hadoop04:datanode,journal node,QuorumPeerMain,NodeManager,Kafka,Worker
hadoop05:datanode,journal node,QuorumPeerMain,NodeManager,Kafka,Worker

虚拟机安装好之后reboot
配置网络:
vim /etc/sysconfig/network-scripts/ifcfg-e***
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=b7e7186c-6ef6-44d6-8855-701153e5827d
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.0.105
PREFIX=24
GATEWAY=192.168.0.1
DNS1=144.144.144.144
DNS2=8.8.8.8
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
重启网络
service network restart
配置DNS
检查NetManager的状态:systemctl status NetworkManager.service
检查NetManager管理的网络接口:nmcli dev status
检查NetManager管理的网络连接:nmcli connection show
设置dns:nmcli con mod ens33 ipv4.dns "114.114.114.114 8.8.8.8"
让dns配置生效:nmcli con up ens33

配置hosts
vi /etc/hosts
192.168.0.105 hadoop01
192.168.0.106 hadoop02
192.168.0.107 hadoop03
192.168.0.108 hadoop04
192.168.0.109 hadoop05


关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
关闭Windows的防火墙
使用CRT或者Xshell连接虚拟机

配置yum
yum clean all
yum makecache
yum install -y wget
还可以安装一些自己喜好的工具

所有安装路径为:/usr/local/
安装Java
解压jdk
配置环境变量:
vim ~/.bashrc
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$JAVA_HOME/bin

source ~/.bashrc
检测是否安装成功:java -version

此时安装另外4台虚拟机,过程一样的。

安装完成后,配置ssh免密登录:
ssh-keygen -t rsa (在主节点执行,一路回车)
ssh-copy-id 192.168.0.106 (cope到从节点)

部署zookeeper集群:

在hadoop03、hadoop04、hadoop05三台机器上部署zookeeper

tar -zxvf zookeeper-3.4.9.tar.gz
mv zookeeper-3.4.9 zookeeper

vi ~/.bashrc
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
source ~/.bashrc

vi zoo.cfg ($ZOOKEEPER_HOME/conf)

dataDir=/home/data/zookeeper
dataLogDir=/home/log/zookeeper
server.1=hadoop03:2888:3888
server.2=hadoop04:2888:3888
server.3=hadoop05:2888:3888

mkdir -p /home/data/zookeeper
mkdir -p /home/log/zookeeper
cd /home/data/zookeeper
echo 1 > myid

scp -r /usr/local/zookeeper hadoop04:/usr/local ,修改环境变量,在hadoop04将myid的内容改为2 (echo 2 > myid)
scp -r /usr/local/zookeeper hadoop05:/usr/local ,修改环境变量,在hadoop05将myid的内容改为3 (echo 3 > myid)

三台机器上执行:zkServer.sh start
zkServer.sh status(查看集群状态,主从信息)


部署hadoop集群:
首先安装hadoop2.9.1(HA)
下载安装包:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz
解压安装包
改名字为 hadoop
cd /usr/local/hadoop/etc/hadoop

在hadoop01~05都创建目录:mkdir -p /home/apps/hadoop/tmp
在hadoop01~05上都创建目录:mkdir -p /home/apps/hadoop/journaldata

vim core-site.xml
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ns1/</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/apps/hadoop/tmp</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop03:2181,hadoop04:2181,hadoop05:2181</value>
  </property>

 <!-- 最大连接次数,修改大一点,为20,默认10 -->
  <property>
          <name>ipc.client.connect.max.retries</name>
          <value>20</value>
          <description>Indicates the number of retries a client will make to establish
               a server connection.
          </description>
  </property>
  <!-- 重复连接的间隔1000ms=1s -->
  <property>
          <name>ipc.client.connect.retry.interval</name>
          <value>1000</value>
          <description>Indicates the number of milliseconds a client will wait for
                        before retrying to establish a server connection.
          </description>
  </property>
</configuration>

vim hadoop-env.sh
export JAVA_HOME=/usr/java/latest

vim hdfs-site.xml
<configuration>
  <property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn2</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn1</name>
    <value>hadoop01:9000</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn1</name>
    <value>hadoop01:50070</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn2</name>
    <value>hadoop02:9000</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn2</name>
    <value>hadoop02:50070</value>
  </property>
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop03:8485;hadoop04:8485;hadoop05:8485/ns1</value>
  </property>
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/apps/hadoop/journaldata</value>
  </property>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>
      sshfence
      shell(/bin/true)
    </value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
  </property>
</configuration>

vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id 两个namenode的clusterID不能一样-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<!-- 指定zookeeper集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop03:2181,hadoop04:2181,hadoop05:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

vim mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 指定jobhistory server的http地址 -->
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop01:19888</value>
  </property>


  <!-- 开启uber模式(针对小作业的优化) -->
  <property>
    <name>mapreduce.job.ubertask.enable</name>
    <value>true</value>
  </property>


  <!-- 配置启动uber模式的最大map数 -->
  <property>
    <name>mapreduce.job.ubertask.maxmaps</name>
    <value>9</value>
  </property>


  <!-- 配置启动uber模式的最大reduce数 -->
  <property>
    <name>mapreduce.job.ubertask.maxreduces</name>
    <value>1</value>
  </property>
</configuration>

vim slaves
hadoop03
hadoop04
hadoop05

将这个hadoop拷贝到hadoop02~05台机器上去:scp -r /usr/local/hadoop hadoop02:/usr/local/,修改环境变量

在hadoop03、hadoop04、hadoop05上启动journal nodes集群:
sbin/hadoop-daemon.sh start journalnode

格式化hadoop01:bin/hdfs namenode -format
启动刚格式化好的namenode:sbin/hadoop-deamon.sh start namenode
在第二台机器hadoop02上同步namenode的数据:bin/hdfs namenode -bootstrapStandby
启动第二台hadoop02的namenode:sbin/hadoop-deamon.sh start namenode

格式化ZKFC(hadoop01上执行)
hdfs zkfc -formatZK

启动hdfs集群(hadoop01上执行)
sbin/start-dfs.sh,这个脚本会自动在hadoop01和hadoop02上启动一个namenode,同时启动一个ZKFC,然后会自动在hadoop03、hadoop04、hadoop05上分别启动一个datanode

然后手动切换namenode状态
手动切换namenode状态(也可以在第一台切换第二台为active,毕竟一个集群)
bin/hdfs haadmin -transitionToActive nn1 ##切换成active
bin/hdfs haadmin -transitionToStandby nn1 ##切换成standby
注: 如果不让你切换的时候,bin/hdfs haadmin -transitionToActive nn2 --forceactive
也可以直接通过命令行查看namenode状态, bin/hdfs haadmin -getServiceState nn1

查看active 和standby命令
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2

没问题后启动yarn:
hadoop01:start-yarn.sh
hadoop02:yarn-daemon.sh start resourcemanager


注意!!!
如果反复格式化namenode,需要将集群HDFS关闭,journal node的三个节点stop后再删除一下文件:
/usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
rm -rf /home/apps/hadoop/tmp/*
rm -rf /home/apps/hadoop/journaldata/*
然后再在hadoop03、hadoop04、hadoop05、启动journalnode
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode

致辞,hadoopHA集群安装完成。

centos7 安装MySQL:
https://www.cnblogs.com/yowamushi/p/8043054.html

使用yum安装mysql server。
yum install -y mysql-server
service mysqld start
chkconfig mysqld on
使用yum安装mysql connector
yum install -y mysql-connector-java
将mysql connector拷贝到hive的lib包中
cp /usr/share/java/mysql-connector-java-5.1.17.jar /usr/local/hive/lib      版本问题:     D:\down\mysql-connector-java-5.1.46
改MySQL密码:set password for root@localhost = password('123456');
在mysql上创建hive元数据库,并对hive进行授权
create database if not exists hive_metadata;
grant all privileges on hive_metadata.* to 'root'@'%' identified by '123456';
grant all privileges on hive_metadata.* to 'root'@'localhost' identified by '123456';
grant all privileges on hive_metadata.* to 'root'@'hadoop011' identified by '123456';
flush privileges;
use hive_metadata;


安装hive2.3.4:https://blog.csdn.net/pengjunlee/article/details/81607890
Hive配置项的含义详解:https://blog.csdn.net/aaa1117a8w5s6d/article/details/16884401
配置环境变量:
vim /etc/profile
以上文章写的很详细。

hive是单节点安装就行,不需要拷贝到其他节点。

注意一个大坑:hive的元数据存储在MySQL后,需要将MySQL的包拷贝到hive 的lib目录下,要注意版本,我用的是mysql-connector-java-5.1.46-bin.jar

安装Scala:2.11.12
解压
配置环境变量
检测是否安装成功:scala -version
将scala目录拷贝到其他节点,并配置环境变量


安装kafka:kafka_2.11-0.10.2.2.tgz
解压
改名为kafka

vim /usr/local/kafka/config/server.properties
zookeeper.connect=192.168.0.107:2181,192.168.0.108:2181,192.168.0.109:2181
server.properties中的broker.id,要设置为1、2、3、4(拷贝到其他机器的时候要改broker.id,不能重复)

安装slf4j
将slf4j-1.7.6.zip上传到/usr/local目录下
unzip slf4j-1.7.6.zip
把slf4j中的slf4j-nop-1.7.6.jar复制到kafka的libs目录下面

拷贝kafka目录到其他机器,注意更改broker.id

在5台机器上分别执行以下命令:nohup bin/kafka-server-start.sh config/server.properties &

使用基本命令检查kafka是否搭建成功

bin/kafka-topics.sh --zookeeper 192.168.0.107:2181,192.168.0.108:2181,192.168.0.109:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create

bin/kafka-console-producer.sh --broker-list 192.168.0.107:9092,192.168.0.108:9092,192.168.0.109:9092 --topic TestTopic

bin/kafka-console-consumer.sh --zookeeper 192.168.0.107:2181,192.168.0.108:2181,192.168.0.109:2181 --topic TestTopic --from-beginning

测试成功。

安装spark2.2.2
解压
vi spark-env.sh
export JAVA_HOME=/usr/java/latest
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER_IP=192.168.0.105
export SPARK_WORKER_MEMORY=2g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

vim slaves(犹豫hadoop01 为active,并且spark占用内存加较多暂时不设置为spark worker节点)
hadoop02
hadoop03
hadoop04
hadoop05

启动spark:
sbin/start-all.sh

8080端口监听
查看spark-shell、spark-sql是否正常。

环境变量:
export JAVA_HOME=/usr/java/latest
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_HOME=/usr/local/spark
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

多谢!

启动集群:
hadoop03:
/usr/local/zookeeper/bin/zkServer.sh start
hadoop04:
/usr/local/zookeeper/bin/zkServer.sh start
hadoop05:
/usr/local/zookeeper/bin/zkServer.sh start
hadoop01:
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh
hadoop02:
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
hadoop01:
/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
/usr/local/kafka/bin/kafka-server-start.sh
/usr/local/spark/sbin/start-all.sh


关闭集群:顺序很重要!!!
hadoop01:
/usr/local/kafka/bin/kafka-server-stop.sh
/usr/local/spark/sbin/stop-all.sh
/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver
/usr/local/hadoop/sbin/stop-yarn.sh
hadoop02:
/usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
hadoop01:
/usr/local/hadoop/sbin/stop-dfs.sh
hadoop03:
/usr/local/zookeeper/bin/zkServer.sh stop
hadoop04:
/usr/local/zookeeper/bin/zkServer.sh stop
hadoop05:
/usr/local/zookeeper/bin/zkServer.sh stop

 

 

 

 

 

 

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Monodyee/article/detail/605445
推荐阅读
相关标签
  

闽ICP备14008679号