赞
踩
ubuntu18.04
Hive下载官网:https://dlcdn.apache.org/hive/
下载Hive
wget https://dlcdn.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
解压Hive
tar -zxf apache-hive-3.1.2-bin.tar.gz
更改名称
mv apache-hive-3.1.2-bin hive
配置环境变量
vim /etc/profile
加入
export HiVE_HOME=/root/hive
export HADOOP_HOME=/root/hadoop
export PATH=$PATH:$HIVE_HOME/bin
使配置生效
source /etc/profile
配置Hive配置文件
#进入hive的安装目录
cd hive
#修改配置文件名称
mv conf/hive-default.xml.template conf/hive-default.xml
#创建一个hive-site.xml配置文件
vim conf/hive-site.xml
在hive-site.xml内写入一下内容:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> </configuration>
安装mysql
apt install mysql-server
关闭和启动mysql
systemctl start mysql
systemctl stop mysql
验证是否启动成功
下载驱动:
https://downloads.mysql.com/archives/c-j/
wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.25.tar.gz
解压:
tar -zxf mysql-connector-java-8.0.25.tar.gz
复制解压后的文件到hive/lib目录下
cp mysql-connector-java-8.0.25/mysql-connector-java-8.0.25.jar hive/lib/
登录mysql
mysql -u root -p
创建hive数据库
create database hive;
将所有数据库的所有表的所有权限赋给hive用户,后面的hive是配置hive-site.xml中配置的连接密码
grant all on *.* to hive@localhost identified by ‘hive’;
刷新mysql系统权限关系表
flush privileges;
启动hive
bin/hive
col_name | data_type |
---|---|
i_exchange | string |
symbol | string |
ymd | string |
price_open | float |
price_high | float |
price_low | float |
price_close | float |
volume | int |
price_adj_close | float |
create table test.stocks (
`exchange` string,
symbol string,
ymd string,
price_open float,
price_high float,
price_low float,
price_close float,
volume int,
price_adj_close float
)
row format delimited fields terminated by ','
col_name | data_type |
---|---|
ymd | string |
dividend | float |
exchange | string |
symbol | string |
create external table dividends (
ymd string,
dividend float)
partitioned by (`exchange` string, symbol string)
row format delimited fields terminated by ','
location 'hdfs://localhost:9000/hive/test/dividends'
stocks.csv文件放在我的/root/my_tmp目录下
导入数据:
load data local inpath’/root/my_tmp/stocks.csv’ into table stocks;
col_name | data_type |
---|---|
ymd | string |
dividend | float |
exchange | string |
symbol | string |
创建表dividends_unpartitioned
create external table dividends_unpartitioned (
ymd string,
dividend float,
`exchange` string,
symbol string)
row format delimited fields terminated by ','
location 'hdfs://localhost:9000/hive/test/dividends_unpartitioned'
dividends.csv文件存放在/root/my_tmp目录下。
导入数据:
load data local inpath’/root/my_tmp/dividends.csv’ overwrite into table dividends_unpartitioned;
insert overwrite table dividends partition (`exchange`, `symbol`) select `ymd`, `dividend`, `exchange`, `symbol` from dividends_unpartitioned;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;
select s.ymd,s.symbol,s.price_close
from stocks s
LEFT SEMI JOIN
dividends d
ON s.ymd=d.ymd and s.symbol=d.symbol
where s.symbol='IBM' and year(ymd)>=2000;
select ymd,
case
when price_close-price_open>0 then 'rise'
when price_close-price_open<0 then 'fall'
else 'unchanged'
end as situation
from stocks
where symbol='AAPL' and substring(ymd,0,7)='2008-10';
select `exchange`,`symbol`,`ymd`,price_close,price_open,price_close-price_open as `diff`
from
(
select *
from stocks
order by price_close-price_open desc
limit 1
)t;
select
year(ymd) as `year`,
avg(price_adj_close) as avg_price from stocks
where `exchange`='NASDAQ' and symbol='AAPL'
group by year(ymd)
having avg_price > 50;
select t2.`year`,symbol,t2.avg_price from ( select *,row_number() over(partition by t1.`year` order by t1.avg_price desc) as `rank` from ( select year(ymd) as `year`, symbol, avg(price_adj_close) as avg_price from stocks group by year(ymd),symbol )t1 )t2 where t2.`rank`<=3;
问题具体描述:
在hive的控制界面输入show databases;显示HiveException java.lang.RuntimeException错误
解决:
进行格式化
bin/schematool -initSchema -dbType mysql
出现错误:
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
;
解决:
将mysql驱动的jar包放到hive的lib目录下
cp mysql-connector-java-8.0.25/mysql-connector-java-8.0.25.jar hive/lib/
之后重新执行初始化
bin/schematool -initSchema -dbType mysql
之后问题成功解决
在插入一下语句出现问题:
create table test.stocks (
exchange string,
symbol string,
ymd string,
price_open float,
price_high float,
price_low float,
price_close float,
volume int,
price_adj_close float
);
解决:将exchange字段改为i_exhange或者在exchange字段用 ` 符号包裹
因为 exchange 是一个关键字,用于将分区中的数据从一个表移动到另一个具有相同架构但还没有该分区的表以获取详细信息
问题描述:创建分区表时出现错误
create table dividends (
ymd string,
dividend float,
`exchange` string,
symbol string )
partitioned by (`exchange` string, symbol string)
分区字段不能和表中的字段重复。
分区字段是根据select出来的具体值进行动态分区,因此就需要开启:set hive.exec.dynamic.partition.mode=nonstrict
解决:
set hive.exec.dynamic.partition.mode=nonstrict
在执行分区插入时出错:
insert overwrite table dividends partition (`exchange`, `symbol`) select `ymd`, `dividend`, `exchange`, `symbol` from dividends_unpartitioned;
尝试解决:
查看hive的日志,日志一般存放在/tmp/{当前用户}路径下
通过日志发现有两处错误,一处是Java 的堆内存不足,OutOfMemory,一个是分区数不足
设置扩大堆内存,设置map和reduce需要的最大堆内存为2048MB
set mapreduce.map.java.opts = -Xmx2048m;
set mapreduce.reduce.java.opts = -Xmx2048m;
设置允许每个节点创建最大动态分区数:
set hive.exec.max.dynamic.partitions.pernode=500;
重新进行分区导入任然报错。
进入hadoop配置文件etc/hadoop/mapred-site.xml进行修改。
添加以下配置
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
停止hadoop重新启动hadoop并进入hive
如果进入失败可能是hadoop处于安全模式,等一会就好
重新进行操作,还是不行。。。。。。
设置另一个参数(设置内存缓冲的大小,在suffle之前这个选项定义了map输出结果在内存里占用buffer的大小,当buffer达到某个阈值(后面那条配置),会启动一个后台线程来对buffer的内容进行排序,然后写入本地磁盘(一个spill文件)):
set io.sort.mb=10;
可以解决,但是进行数据操作时可能耗时太长
将hadoop的配置文件下的hadoop-env.sh加上最大堆值
之后重启hadoop
https://www.jianshu.com/p/bcfa7bb40546
http://dblab.xmu.edu.cn/blog/2440-2/
https://blog.csdn.net/Maishuchongji/article/details/120923403
https://blog.csdn.net/zhengzaifeidelushang/article/details/117001957
https://www.coder.work/article/730910
https://blog.csdn.net/pingsha_luoyan/article/details/97272987
https://blog.csdn.net/Dax1n/article/details/81568442
JVM - 参数简介 https://www.jianshu.com/p/1c6b5c2e95f9
MapReduce 基本优化相关参数 https://www.cnblogs.com/yjt1993/p/9483032.html
hive报错Java heap space https://blog.csdn.net/qq_33792843/article/details/91875920
Hive一些参数设置 https://www.cnblogs.com/cfox/p/3849407.html
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。