当前位置:   article > 正文

HiveOnSpark安装_hive on spark 安装

hive on spark 安装

1.前提准备

1.安装好hadoop(建议安装高可用)

如果没有安装,参考 采集项目(HA)(五台服务器)_ha数据采集-CSDN博客

2.安装Hive

1.解压

[atguigu@hadoop100 software]$ tar -zxvf /opt/software/apache-hive-3.1.3.tar.gz -C /opt/module/
[atguigu@hadoop100 software]$ mv /opt/module/apache-hive-3.1.3-bin/ /opt/module/hive

2.环境变量

[atguigu@hadoop100 software]$ sudo vim /etc/profile.d/my_env.sh
#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
[atguigu@hadoop100 software]$ source /etc/profile.d/my_env.sh

解决日志Jar包冲突,进入/opt/module/hive/lib

[atguigu@hadoop100 lib]$ mv log4j-slf4j-impl-2.17.1.jar log4j-slf4j-impl-2.17.1.jar.bak

3.hive元数据配置到mysql 拷贝驱动

[atguigu@hadoop102 lib]$ cp /opt/software/mysql/mysql-connector-j-8.0.31.jar /opt/module/hive/lib/

配置Metastore到mysql

[atguigu@hadoop102 conf]$ vim hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop100:3306/metastore?useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>000000</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop101</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>

4.启动hive 1.登录mysql

[atguigu@hadoop100 conf]$ mysql -uroot -p000000

2.新建hive元数据库

mysql> create database metastore;

3.初始化hive元数据库

[atguigu@hadoop100 conf]$ schematool -initSchema -dbType mysql -verbose

4.修改元数据字符集

mysql>use metastore;
mysql> alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
mysql> alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8;
mysql> quit;

5.启动hive客户端

[atguigu@hadoop100 hive]$ bin/hive

6.用客户端软件连接时

[atguigu@hadoop100 bin]$ hiveserver2

3.Spark纯净包安装

1.纯净包下载地址

Downloads | Apache Spark

2.解压

[atguigu@hadoop102 software]$ tar -zxvf spark-3.3.1-bin-without-hadoop.tgz -C /opt/module/
[atguigu@hadoop102 software]$ mv /opt/module/spark-3.3.1-bin-without-hadoop /opt/module/spark

3.编辑文档

[atguigu@hadoop102 software]$ mv /opt/module/spark/conf/spark-env.sh.template /opt/module/spark/conf/spark-env.sh
[atguigu@hadoop102 software]$ vim /opt/module/spark/conf/spark-env.sh

添加内容

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

4.环境变量

[atguigu@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
# SPARK_HOME
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin
[atguigu@hadoop102 software]$ source /etc/profile.d/my_env.sh

5.在hive中创建spark配置文件

[atguigu@hadoop102 software]$ vim /opt/module/hive/conf/spark-defaults.conf
spark.master                               yarn
spark.eventLog.enabled                   true
spark.eventLog.dir                        hdfs://mycluster:8020/spark-history
spark.executor.memory                    1g
spark.driver.memory					     1g

注意:配置文件中hdfs://mycluster:8020/spark-history 是namenode的地址,我本人高可用名称是mycluster(如果不是高可用,写ip地址即可)

[atguigu@hadoop102 software]$ hadoop fs -mkdir /spark-history

6.向HDFS上传Spark纯净版jar包

说明1:采用Spark纯净版jar包,不包含hadoop和hive相关依赖,能避免依赖冲突。

说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。

[atguigu@hadoop102 software]$ hadoop fs -mkdir /spark-jars
[atguigu@hadoop102 software]$ hadoop fs -put /opt/module/spark/jars/* /spark-jars

7.修改hive-site.xml文件

[atguigu@hadoop102 ~]$ vim /opt/module/hive/conf/hive-site.xml
<!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)-->
<property>
    <name>spark.yarn.jars</name>
    <value>hdfs://mycluster:8020/spark-jars/*</value>
</property>
  
<!--Hive执行引擎-->
<property>
    <name>hive.execution.engine</name>
    <value>spark</value>
</property>

注意:配置文件中hdfs://mycluster:8020/spark-jars/*是namenode的地址,我本人高可用名称是mycluster(如果不是高可用,写ip地址即可)

4.Yarn环境配置

1.修改配置

vim /opt/module/hadoop/etc/hadoop/capacity-scheduler.xml

如果该配置有就修改,没有就添加

<property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.8</value>
</property

2.分发

[atguigu@hadoop102 hadoop]$ xsync capacity-scheduler.xml

3.重启

[atguigu@hadoop103 hadoop]$ stop-yarn.sh
[atguigu@hadoop103 hadoop]$ start-yarn.sh

5.测试

1.测试

[atguigu@hadoop102 hive]$ hive
hive (default)> create table student(id int, name string);
hive (default)> insert into table student values(1,'abc');

2.远程连接

[atguigu@hadoop102 hive]$ hiveserver2

注意:如果是服务器,需要打开安全组

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/空白诗007/article/detail/987888
推荐阅读
相关标签
  

闽ICP备14008679号