赞
踩
虚拟机软件:VirtualBox
Linux 发行版本:Ubuntu 20.04.4
虚拟机核心数:1 core
虚拟机内存:2 GB
JDK 版本:1.8.0_202
Hadoop 版本:3.2.3
ZK 版本:3.8.0
Hive 版本:3.1.3
MySQL 版本:8.0.28
Hive 默认使用的元数据库为 derby,开启 Hive 之后就会占用元数据库,且不与其他客户端共享数据,所以我们需要将 Hive 的元数据地址改为 MySQL
按照以上分析,便有以下集群规划
node01 | node02 | node03 |
---|---|---|
NameNode | NameNode | |
JournalNode | JournalNode | JournalNode |
DataNode | DataNode | DataNode |
ZK | ZK | ZK |
ResourceManager | ResourceManager | |
NodeManager | NodeManager | NodeManager |
HiveServer2 | ||
Metastore | ||
MySQL |
$ apt install -y mysql-server=8.0.28-0ubuntu0.20.04.4
$ mysql_secure_installation # 初始化 mysql,按照引导设置
mysql> create user 'root'@'%' identified by '123456';
mysql> grant all privileges on *.* to 'root'@'%';
mysql> alter user 'root'@'%' identified with mysql_native_password by '123456';
mysql> flush privileges;
报错:ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
说明 MySQL 开启了密码检查策略,修改他。或设置一个复杂的密码。
$ vim /etc/mysql/mysql.conf.d/mysqld.cnf
# 修改 bind-addres
bind-address = 0.0.0.0
$ service mysql restart
mysql> show variables like '%validate_password%';
+--------------------------------------+--------+
| Variable_name | Value |
+--------------------------------------+--------+
| validate_password.check_user_name | ON |
| validate_password.dictionary_file | |
| validate_password.length | 8 |
| validate_password.mixed_case_count | 1 |
| validate_password.number_count | 1 |
| validate_password.policy | MEDIUM |
| validate_password.special_char_count | 1 |
+--------------------------------------+--------+
7 rows in set (0.00 sec)
此处做临时修改
$ set global validate_password.policy=0;
$ set global validate_password.length=6;
永久修改,需要改 my.cnf
后,重启 MySQL 生效。修改参数是一样的
$ vim /etc/profile
# 复制以下俩行
export HIVE_HOME=/opt/hive-3.1.3
export PATH=$PATH:$JAVA_HOME/bin:$ZK_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
$ source /etc/profile
解决 Jar 包冲突
$ cd $HIVE_HOME
$ mv log4j-slf4j-impl-2.17.1.jar log4j-slf4j-impl-2.17.1.jar.bak
根据 MySQL 版本选择对应的 JDBC Jar 包,给 Hive
$ mv mysql-connector-java-8.0.28.jar /opt/hive-3.1.3/lib/
$ vim $HIVE_HOME/conf/hive-site.xml
<configuration> <!-- jdbc 连接的 URL --> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://node03:3306/metastore?useSSL=false&serverTimezone=Asia/Shanghai</value> </property> <!-- jdbc 连接的 Driver--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> </property> <!-- jdbc 连接的 username--> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <!-- jdbc 连接的 password --> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <!-- Hive 元数据存储版本的验证 --> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <!--元数据存储授权--> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <!-- Hive 默认在 HDFS 的工作目录 --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <!-- 指定存储元数据要连接的地址 --> <property> <name>hive.metastore.uris</name> <value>thrift://node02:9083</value> </property> <!-- 指定 hiveserver2 连接的 host --> <property> <name>hive.server2.thrift.bind.host</name> <value>node02</value> </property> <!-- 指定 hiveserver2 连接的端口号 --> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> </configuration>
$ vim $HADOOP_HOME/etc/hadoop/core-site.xml
添加以下内容
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
更改配置要重启 Hadoop 集群才会生效
$ mysql -uroot -p123456
mysql> create database metastore;
mysql> exit;
$ schematool -initSchema -dbType mysql -verbose
报错:Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
HIVE 和 Hadoop 的 guava 版本不一致,全部统一为高版本即可
$ cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/ $ mv $HIVE_HOME/lib/guava-19.0.jar $HIVE_HOME/lib/guava-19.0.jar.bak
- 1
- 2
测试前,要保证 Hadoop 集群开启
hiveserver2 和 metastore 是服务端,必须保持一直运行
beeline 是访问 Hive 的客户端,只需要在需要访问的节点上部署运行即可
$ hive --service metastore # 默认前台运行
$ hive --service hiveserver2 # 默认前台运行
$ beeline -u jdbc:hive2://node02:10000 -n root
0: jdbc:hive2://node02:10000> create table test (id int);
0: jdbc:hive2://node02:10000> insert into test values(1);
0: jdbc:hive2://node02:10000> select * from test; # 出现以下内容表示成功
+----------+
| test.id |
+----------+
| 1 |
+----------+
1 row selected (3.987 seconds)
报错:User: root is not allowed to impersonate root
已经配置了 hadoop.proxyuser.root.hosts 和 hadoop.proxyuser.root.groups 还报这个错误。错误原因未知。但是可以让 NN 主备切换一次,然后重试就连上了
$ hdfs haadmin -getServiceState nn1 standby $ hdfs haadmin -getServiceState nn2 active $ jps 5632 Jps 4993 RunJar 4068 NameNode 4853 NodeManager 4518 DataNode 4215 JournalNode 4392 DFSZKFailoverController 5005 RunJar 1070 QuorumPeerMain $ kill -9 4068 $ hdfs haadmin -getServiceState nn1 active $ hadoop-daemon.sh start namenode $ hdfs haadmin -getServiceState nn2 standby
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
$ vim $HIVE_HOME/bin/hiveService.sh
#!/bin/bash HIVE_LOG_DIR=$HIVE_HOME/logs if [ ! -d $HIVE_LOG_DIR ] then mkdir -p $HIVE_LOG_DIR fi #检查进程是否运行正常,参数 1 为进程名,参数 2 为进程端口 function check_process() { pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}') ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1) echo $pid [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1 } function hive_start() { metapid=$(check_process HiveMetastore 9083) cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &" [ -z "$metapid" ] && eval $cmd || echo "Metastroe 服务已启动" server2pid=$(check_process HiveServer2 10000) cmd="nohup hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &" [ -z "$server2pid" ] && eval $cmd || echo "HiveServer2 服务已启动" } function hive_stop() { metapid=$(check_process HiveMetastore 9083) [ "$metapid" ] && kill $metapid || echo "Metastore 服务未启动" server2pid=$(check_process HiveServer2 10000) [ "$server2pid" ] && kill $server2pid || echo "HiveServer2 服务未启动" } case $1 in "start") hive_start ;; "stop") hive_stop ;; "restart") hive_stop sleep 2 hive_start ;; "status") check_process HiveMetastore 9083 >/dev/null && echo "Metastore 服务运行正常" || echo "Metastore 服务运行异常" check_process HiveServer2 10000 >/dev/null && echo "HiveServer2 服务运行正常" || echo "HiveServer2 服务运行异常" ;; *) echo Invalid Args! echo 'Usage: '$(basename $0)' start|stop|restart|status' ;; esac
chmod +x $HIVE/bin/hiveService.sh
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>3.1.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.3</version>
</dependency>
</dependencies>
报错:org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failed to read artifact descriptor for org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Could not transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT
把本地仓库中的 javax.el 目录下的所有 SNAPSHOT 里的 pom.lastupdate 变成 pom
如:javax.el-3.0.1-b06-SNAPSHOT.pom.lastupdate 改成 javax.el-3.0.1-b06-SNAPSHOT.pom
public class Main { public static void main(String[] args) throws Exception { String driverName = "org.apache.hive.jdbc.HiveDriver"; Class.forName(driverName); Connection con = DriverManager.getConnection( "jdbc:hive2://node02:10000", "root", "123456"); Statement stmt = con.createStatement(); String sql = "select * from test"; ResultSet res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1)); } } }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。