赞
踩
1.关于用户的问题:
所有操作使用root账户完成,实际生产环境不推荐!
2.关于同步传输文件或目录的问题:
CentOS之间同步文件命令:
scp /本机目录名/本机文件路径 对方用户名@对方主机名:/对方目录名/
#例如:
#sudo scp /opt/software/hadoop-2.9.2/etc/hadoop/core-site.xml root@centos002:/opt/software/hadoop-2.9.2/etc/hadoop/
#将core-site.xml文件传到对方/opt/software/etc/hadoop/目录下
#sudo scp /opt/software/hadoop-2.9.2/etc/hadoop/* root@centos002:/opt/software/hadoop-2.9.2/etc/hadoop/
#将/opt/software/hadoop-2.9.2/etc/hadoop/目录下所有文件传送到对方/opt/software/hadoop-2.9.2/etc/hadoop/目录下
CentOS之间同步目录命令:
scp -r /本机目录路径 对方用户名@对方主机名:/对方目录名/
#例如:
#sudo scp -r /opt/software/zookeeper-3.4.13 root@centos002:/opt/software/
#sudo scp -r /opt/software/zookeeper-3.4.13 root@centos003:/opt/software/
#sudo scp -r /opt/software/zookeeper-3.4.13 root@centos004:/opt/software/
#本机zookeeper-3.4.13目录及其内所有文件都会完整的复制到对方/opt/software/目录下
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.15.131 centos001
192.168.15.132 centos002
192.168.15.133 centos003
192.168.15.134 centos004
vim /etc/hostname
修改/etc/sysconfig/network-scripts/ifcfg-ens33文件(配置固定IP地址)
vim /etc/sysconfig/network-scripts/ifcfg-ens33
systemctl stop firewalld
systemctl disable firewalld
配置环境变量
vim /etc/profile
export JAVA_HOME=/opt/software/java/jdk1.8.0_211/ export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/opt/software/hadoop-2.9.2/ export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export ZOOKEEPER_HOME=/opt/software/zookeeper-3.4.13 export PATH=$PATH:$ZOOKEEPER_HOME/bin export HIVE_HOME=/opt/software/hive-2.3.4 export PATH=$PATH:$HIVE_HOME/bin export PYSPARK_PYTHON=/usr/bin/python3.6 export SPARK_HOME=/opt/software/spark-2.3.3 export PATH=$PATH:$SPARK_HOME/bin export KAFKA_HOME=/opt/software/kafka-2.1.1 export PATH=$PATH:$KAFKA_HOME/bin export HBASE_HOME=/opt/software/hbase-1.4.10 export PATH=$PATH:$HBASE_HOME/bin #export JAVA_LIBRARY_PATH=/opt/software/hadoop-2.9.2/lib/native
每台CentOS更新配置
source /etc/profile
都配置完后重启虚拟机
每台CentOS都要使用root用户执行一遍(前提:每台机器都在线且可ping通)
ssh-keygen -t rsa
ssh-copy-id centos001
ssh-copy-id centos002
ssh-copy-id centos003
ssh-copy-id centos004
本次所有CentOS都使用jdk1.8.0_211
复制模版文件zoo_sample.cfg为zoo.cfg并修改内容为:
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/opt/software/zookeeper-3.4.13/data dataLogDir=/opt/software/zookeeper-3.4.13/datalog # the port at which the clients will connect clientPort=2181 server.1=centos001:2888:3888 server.2=centos002:2888:3888 server.3=centos003:2888:3888 server.4=centos004:2888:3888
mkdir /opt/software/zookeeper-3.4.13/data
mkdir /opt/software/zookeeper-3.4.13/datalog
vim /opt/software/zookeeper-3.4.13/data/myid
将myid文件的内容设置为1
scp -r /opt/software/zookeeper-3.4.13 root@centos002:/opt/software/
scp -r /opt/software/zookeeper-3.4.13 root@centos003:/opt/software/
scp -r /opt/software/zookeeper-3.4.13 root@centos004:/opt/software/
修改CentOS002、CentOS003、CentOS004的myid内容为2、3、4
vim /opt/software/zookeeper-3.4.13/data/myid
每台CentOS执行如下命令:
zkServer.sh start
查看是否开启成功:
zkServer.sh status
jps
每台CentOS出现Mode: follower或Mode: leader且jps存在QuorumPeerMain进程即zookeeper集群开启成功
<configuration> <!-- 把多个 NameNode 的地址组装成一个集群 mycluster --> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <!-- 指定 hadoop 运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/software/hadoop-2.9.2/data</value> </property> <!-- 指定 zkfc 要连接的 zkServer 地址 --> <property> <name>ha.zookeeper.quorum</name> <value>centos001:2181,centos002:2181,centos003:2181,centos004:2181</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
<configuration> <!-- NameNode 数据存储目录 --> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/name</value> </property> <!-- DataNode 数据存储目录 --> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/data</value> </property> <!-- JournalNode 数据存储目录 --> <property> <name>dfs.journalnode.edits.dir</name> <value>${hadoop.tmp.dir}/jn</value> </property> <!-- 完全分布式集群名称 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- 集群中 NameNode 节点都有哪些 --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!-- NameNode 的 RPC 通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>centos001:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>centos002:8020</value> </property> <!-- NameNode 的 http 通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>centos001:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>centos002:9870</value> </property> <!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://centos001:8485;centos002:8485/mycluster</value> </property> <!-- 访问代理类:client 用于确定哪个 NameNode 为 Active --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 使用隔离机制时需要 ssh 秘钥登录--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!-- 启用 nn 故障自动转移 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration>
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 启用 resourcemanager ha --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 声明两台 resourcemanager 的地址 --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster-yarn1</value> </property> <!--指定 resourcemanager 的逻辑列表--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- ========== rm1 的配置 ========== --> <!-- 指定 rm1 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>centos003</value> </property> <!-- 指定 rm1 的 web 端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>centos003:8088</value> </property> <!-- 指定 rm1 的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm1</name> <value>centos003:8032</value> </property> <!-- 指定 AM 向 rm1 申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>centos003:8030</value> </property> <!-- 指定供 NM 连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>centos003:8031</value> </property> <!-- ========== rm2 的配置 ========== --> <!-- 指定 rm2 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>centos004</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>centos004:8088</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>centos004:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>centos004:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>centos004:8031</value> </property> <!-- 指定 zookeeper 集群的地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>centos001:2181,centos002:2181,centos003:2181,centos004:2181</value> </property> <!-- 启用自动恢复 --> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!-- 指定 resourcemanager 的状态信息存储在 zookeeper 集群 --> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLAS SPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
centos002
centos003
centos004
添加如下代码(最好是填JDK的绝对路径):
export JAVA_HOME=/opt/software/java/jdk1.8.0_211
scp -r /opt/software/hadoop-2.9.2 root@centos002:/opt/software/
scp -r /opt/software/hadoop-2.9.2 root@centos003:/opt/software/
scp -r /opt/software/hadoop-2.9.2 root@centos004:/opt/software/
每台CentOS执行(如果已开启zk集群请忽略):
zkServer.sh start
hdfs zkfc -formatZK
hadoop-daemon.sh start journalnode
hdfs namenode -format
hadoop-daemon.sh start namenode
hdfs namenode -bootstrapStandby
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
hadoop-daemon.sh start zkfc
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
start-yarn.sh
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2
centos003
centos004
## 设置JAVA目录 JAVA_HOME=/opt/software/java/jdk1.8.0_211/ ## HADOOP软件配置文件目录,读取HDFS上文件和运行YARN集群 HADOOP_CONF_DIR=/opt/software/hadoop-2.9.2/etc/hadoop YARN_CONF_DIR=/opt/software/hadoop-2.9.2/etc/hadoop ## 指定spark老大Master的IP和提交任务的通信端口 # 告知Spark的master运行在哪个机器上 #export SPARK_MASTER_HOST=master SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=centos001:2181,centos002:2181,centos003:2181,centos004:2181 -Dspark.deploy.zookeeper.dir=/spark-ha" # spark.deploy.recoveryMode 指定HA模式 基于Zookeeper实现 # 指定Zookeeper的连接地址 # 指定在Zookeeper中注册临时节点的路径 # 告知sparkmaster的通讯端口 export SPARK_MASTER_PORT=7077 # 告知spark master的 webui端口 SPARK_MASTER_WEBUI_PORT=8080 # worker cpu可用核数 SPARK_WORKER_CORES=1 # worker可用内存 SPARK_WORKER_MEMORY=1g # worker的工作通讯地址 SPARK_WORKER_PORT=7078 # worker的 webui地址 SPARK_WORKER_WEBUI_PORT=8081 ## 设置历史服务器 # 配置的意思是 将spark程序运行的历史日志 存到hdfs的/sparklog文件夹中 SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://centos001:8020/sparklog/ -Dspark.history.fs.cleaner.enabled=true"
在HDFS上创建程序运行历史记录存放的文件夹:
hdfs dfs -mkdir /sparklog
hdfs dfs -chmod 777 /sparklog
# 开启spark的日期记录功能
spark.eventLog.enabled true
# # 设置spark日志记录的路径
spark.eventLog.dir hdfs://centos001:8020/sparklog/
# # 设置spark日志是否启动压缩
spark.eventLog.compress true
# Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=WARN # Settings to quiet third party logs that are too verbose log4j.logger.org.spark_project.jetty=WARN log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
sudo scp -r /opt/software/spark-2.3.3 root@centos002:/opt/software/
#需要输入对方root用户的密码
sudo scp -r /opt/software/spark-2.3.3 root@centos003:/opt/software/
sudo scp -r /opt/software/spark-2.3.3 root@centos004:/opt/software/
./sbin/start-all.sh
./sbin/start-master.sh
以上:CentOS001显示ACTIVE状态,CentOS002显示STANDBY状态。说明Spark-2.3.3 HA 搭建成功。
wget https://repo.mysql.com/mysql57-community-release-el7-11.noarch.rpm
rpm -ivh mysql57-community-release-el7-11.noarch.rpm
yum repolist all |grep mysql
yum install -y mysql-community-server --nogpgcheck
systemctl start mysqld
systemctl status mysqld
修改密码策略
vim /etc/my.cnf
末尾加两行:
validate_password_policy=Low
validate_password_length=6
重启mysqld服务
systemctl restart mysqld
获取临时密码
grep 'temporary password' /var/log/mysqld.log
使用临时密码登录
mysql -uroot -p
修改Mysql的root账户的密码
set password = "123456";
查询mysql中创建的库
show databases;
创建hive元数据库“hive”
create database hive;
创建用户hive(mysql中为hive专用的user名:hive,密码为:123456)
create user 'hive' identified by '123456';
将数据库hive的所有权限授权于用户hive
Grant all privileges on hive.* to hive identified by '123456';flush privileges;
使用hive用户登录验证,存在hive数据库即成功
mysql -uhive -p123456
show databases;
解压hive-2.3.4到CentOS002并将JDBC驱动复制到hive-2.3.4/lib目录下
cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar /opt/software/hive-2.3.4/lib/
修改hive-env.sh文件内容如下
HADOOP_HOME=/opt/software/hadoop-2.9.2
export HIVE_CONF_DIR=/opt/software/hive-2.3.4/conf
export HIVE_AUX_JARS_PATH=/opt/software/hive-2.3.4/lib
cp hive-default.xml.template hive-default.xml
hive-site.xml修改如下
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://centos001:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> </configuration>
schematool -initSchema -dbType mysql
启动hadoop集群
start-dfs.sh
start-yarn.sh
jps
进入hive
hive
scp -r /opt/software/hive-2.3.4 root@centos003:/opt/software/
在CentOS002中的hive-site.xml中的配置如下:
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://centos001:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://centos002:9083</value> </property> </configuration>
连接流程:
客户端(beeline)–>hiveserver2–>metastore–>mysql
客户端(hive)–>metastore–>mysql
hiveserver2开启时如果没有配置metastore的连接地址那么会自己开启一个metastore,开启后的metastore根据hive-site.xml配置文件连接到mysql。
在CentOS002上后台启动metastore服务与hiveserver2服务:
nohup hive --service metastore &
nohup hive --service hiveserver2 &
在CentOS003的hive-site.xml中的配置如下:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://centos002:9083</value>
</property>
</configuration>
hive
进入beeline
/opt/software/hive-2.3.4/bin/beeline
输入连接命令:
!connect jdbc:hive2://centos002:10000
连接成功:
在CentOS003上解压好sqoop1.4.7后
mv sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/software/hadoop-2.9.2
export HADOOP_MAPRED_HOME=/opt/software/hadoop-2.9.2
export HIVE_HOME=/opt/software/hive-2.3.4
export ZOOKEEPER_HOME=/opt/software/zookeeper-3.4.13
export ZOOCFGDIR=/opt/software/zookeeper-3.4.13
export HBASE_HOME=/opt/software/hbase-1.4.10
#export ACCUMULO_HOME=/opt/software/data/data/accumulo
#export HCAT_HOME=/opt/software/data/servers/hbase/logs/hcat-logs
下载mysql-connector-java-5.1.48解压后将mysql-connector-java-5.1.48.jar拷贝到sqoop的lib目录下:
cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48.jar /opt/software/sqoop-1.4.7/lib/
或者从hive的lib目录下复制一份到sqoop的lib目录下:
cp /opt/software/hive-2.3.4/lib/mysql-connector-java-5.1.48.jar /opt/software/sqoop-1.4.7/lib/
使用具有远程登录权限的用户登录mysql:
bin/sqoop list-databases --connect jdbc:mysql://centos001:3306/useSSL=false --username hive --password 123456
sqoop import --connect jdbc:mysql://centos001:3306/hive?useSSL=false --username hive --password 123456 --table DBS -m 1
import
从非大数据集群(RDBMS)向大数据集群(HDFS,HIVE,HBASE)中传输数据,叫做:导入
–connect jdbc:mysql://centos001:3306/hive?useSSL=false
–username hive
–password 123456
#对远程mysql中的hive数据库进行连接
–table DBS
#导出mysql中hive数据库下的DBS表
-m 1
#指定map数量为1
默认导出到hdfs的 “/user/用户名/mysql中导出的表名” 目录下:
上传至虚拟机,并解压
sudo tar -zxvf apache-flume-1.9.0-bin.tar.gz -C /opt/software/
重命名
sudo mv /opt/software/apache-flume-1.9.0-bin.tar.gz /opt/software/flume-1.9.0
查看flume版本
flume-ng version
测试:
监控一个目录,将数据打印出来
# 首先先给agent起一个名字 叫a1 # 分别给source channel sink取名字 a1.sources = r1 a1.channels = c1 a1.sinks = k1 # 分别对source、channel、sink进行配置 # 配置source # 将source的类型指定为 spooldir 用于监听一个目录下文件的变化 # 因为每个组件可能会出现相同的属性名称,所以在对每个组件进行配置的时候 # 需要加上 agent的名字.sources.组件的名字.属性 = 属性值 a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /opt/software/data/ a1.sources.r1.fileSuffix = .ok a1.sources.r1.fileHeader = true # 给r1这个souces配置一个拦截器并取名为 i1 a1.sources.r1.interceptors = i1 # 将拦截器i1的类型设置为timestamp 会将处理数据的时间以毫秒的格式插入event的header中 # a1.sources.r1.interceptors.i1.type = timestamp # 将拦截器i1的类型设置为regex_filter 会根据正则表达式过滤数据 a1.sources.r1.interceptors.i1.type = regex_filter # 配置正则表达式 a1.sources.r1.interceptors.i1.regex = \\d{3,6} # excludeEvents = true 表示将匹配到的过滤,未匹配到的放行 a1.sources.r1.interceptors.i1.excludeEvents = true # 配置sink # 使用logger作为sink组件,可以将收集到数据直接打印到控制台 a1.sinks.k1.type = logger # 配置channel # 将channel的类型设置为memory,表示将event缓存在内存中 a1.channels.c1.type = memory # 组装 # 将sources的channels属性指定为c1 a1.sources.r1.channels = c1 # 将sinks的channel属性指定为c1 a1.sinks.k1.channel = c1
启动agent
flume-ng agent -n a1 -f ./spoolingtest.conf -Dflume.root.logger=DEBUG,console
新建/usr/local/data目录
mkdir /opt/software/data
在/usr/local/data目录下新建文件,输入内容,观察flume进程打印的日志
vim /opt/software/data/789.txt
cd /opt/software/kafka-2.1.1/config/
vim server.properties
#broker 的全局唯一编号,不能重复,只能是数字。 broker.id=0 #处理网络请求的线程数量 num.network.threads=3 #用来处理磁盘 IO 的线程数量 num.io.threads=8 #发送套接字的缓冲区大小 socket.send.buffer.bytes=102400 #接收套接字的缓冲区大小 socket.receive.buffer.bytes=102400 #请求套接字的缓冲区大小 socket.request.max.bytes=104857600 #kafka 运行日志(数据)存放的路径,路径不需要提前创建,kafka 自动帮你创建,可以 #配置多个磁盘路径,路径与路径之间可以用","分隔 log.dirs=/opt/software/kafka-2.1.1/datas #topic 在当前 broker 上的分区个数 num.partitions=1 #用来恢复和清理 data 下数据的线程数量 num.recovery.threads.per.data.dir=1 # 每个 topic 创建时的副本数,默认时 1 个副本 offsets.topic.replication.factor=2 #segment 文件保留的最长时间,超时将被删除 log.retention.hours=168 #每个 segment 文件的大小,默认最大 1G log.segment.bytes=1073741824 # 检查过期数据的时间,默认 5 分钟检查一次是否数据过期 log.retention.check.interval.ms=300000 #配置连接 Zookeeper 集群地址(在 zk 根目录下创建/kafka,方便管理) zookeeper.connect=centos001:2181,centos002:2181,centos003:2181,centos004:2181/kafka
scp -r /opt/software/kafka-2.1.1 root@centos002:/opt/software/
scp -r /opt/software/kafka-2.1.1 root@centos002:/opt/software/
分别修改部分其中的配置为broker.id=1、broker.id=2
每台CentOS执行(已启动zk集群请忽略):
zkServer.sh start
zookeeper集群启动成功后
CentOS001、CentOS002、CentOS003执行:
bin/kafka-server-start.sh -daemon config/server.properties
bin/kafka-topics.sh --create --zookeeper centos001:2181/kafka --topic test --partitions 1 --replication-factor 1
bin/kafka-console-consumer.sh --bootstrap-server centos002:9092 --topic test
bin/kafka-console-producer.sh --broker-list centos003:9092 --topic test
最后CentOS002获取到消息就说明kafka集群部署成功了
添加以下内容:
export HBASE_PID_DIR=/opt/software/hbase-1.4.10/pids
export HBASE_MANAGES_ZK=false
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://mycluster:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>centos001,centos002,centos003,centos004</value>
</property>
</configuration>
centos001
centos002
centos003
在conf/目录下创建backup-masters文件
vim conf/backup-masters
写入
centos004
scp -r /opt/software/hbase-1.4.10 root@centos002:/opt/software/
scp -r /opt/software/hbase-1.4.10 root@centos003:/opt/software/
scp -r /opt/software/hbase-1.4.10 root@centos004:/opt/software/
先开启hadoop集群以及zookeeper集群
然后再开启hbase集群
bin/start-hbase.sh
至此Hbase HA集群搭建成功!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。