赞
踩
hive是一个数据仓库工具。基于hadoop的,可以通过类sql的语句进行数据读、写、分析、管理。
特点:
hive默认端口是10000
元数据库:默认Derby。我们后面要切换为Mysql
##1. 解压apache-hive-1.2.1-bin.tar.gz
[root@hadoop software]# tar -zxvf apache-hive-1.2.1-bin.tar.gz
##2. 改名
[root@hadoop apps]# mv apache-hive-1.2.1-bin/ hive-1.2.1
##3. 配置环境变量
[root@hadoop hive-1.2.1]# vi /etc/profile
## 自定义环境变量
export JAVA_HOME=/opt/apps/jdk1.8.0_45
export HADOOP_HOME=/opt/apps/hadoop-2.8.1
export HIVE_HOME=/opt/apps/hive-1.2.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
export CLASS_PATH=.:$JAVA_HOME/lib
##4. 启动命令行
[root@hadoop bin]# start-dfs.sh
[root@hadoop bin]# start-yarn.sh
[root@hadoop bin]# hive
hive>show databases;
OK
default
hive> create table t_user(
> id int,
> name string);
hive> show tables;
OK
t_user
hive> insert into t_user values(1, 'zxy'); ## 这句话会执行mr
hive> select * from t_user;
hive> select count(*) from t_user; ## 这句话会执行mr
我们默认使用的存储元数据的是hive自带的derby数据库。这个数据库最大的缺点是只支持单session
[root@hadoop hive-1.2.1]# cd conf
[root@hadoop conf]#vim hive-site.xml
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop:3307/hive?createDataBaseIfNotExists=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
</configuration>
mysql-connector-java-5.1.47-bin.jar
上传之后
- mysql的服务启动着
- mysql远程授权
mysql> grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option; mysql> flush privileges; # hadoop是当前虚拟机的主机名 mysql> grant all privileges on *.* to 'root'@'hadoop' identified by '123456' with grant option; mysql> flush privileges;
- 1
- 2
- 3
- 4
- 5
- HDFS/YARN开启
[root@hadoop conf]# start-dfs.sh [root@hadoop conf]# start-yarn.sh
- 1
- 2
- 环境变量
java+hadoop+hive环境变量配置成功
- 1
- 新建数据库(可视化工具中操作)
[root@hadoop conf]# hive
##1. 开启hiveserver2
[root@hadoop bin]# hive --service hiveserver2 &
##2. beeline客户端链接
[root@hadoop bin]# beeline
beeline> !connect jdbc:hive2://hadoop:10000
Enter username for jdbc:hive2://hadoop:10000: root
Enter password for jdbc:hive2://hadoop:10000: ******
0: jdbc:hive2://hadoop:10000> show databases;
OK
+----------------+--+
| database_name |
+----------------+--+
| default |
+----------------+--+
1 row selected (0.948 seconds)
0: jdbc:hive2://hadoop:10000> show tables;
OK
+-----------+--+
| tab_name |
+-----------+--+
| t_user |
+-----------+--+
1 row selected (0.037 seconds)
0: jdbc:hive2://hadoop:10000> select * from t_user;
OK
+------------+--------------+--+
| t_user.id | t_user.name |
+------------+--------------+--+
| 1 | lixi |
+------------+--------------+--+
1 row selected (0.347 seconds)
0: jdbc:hive2://hadoop:10000>
一进一出
聚集函数, 多进一出
类似于: count/max/min
一进多出
如 lateral view explode()
123456@qq.com
获取邮箱字符串’@'后字符串
SELECT
email.email,
(case when size(split(email.email, '@')) = 2 then split(t1.email, '@')[1] else '' end ) as email_suffix
FROM `email`
SELECT
email.email,
(case when cardinality(split(email.email, '@')) = 2 then split(t1.email, '@')[2] else '' end ) as email_suffix
FROM `email`
在计算数组长度的时候,hive和presto的函数不同
其中hive的size函数默认数组的下标从0开始
presto的cardinality函数默认数组的下标从1开始
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。