当前位置:   article > 正文

hive的使用_"hive user=hadoop, access=execute, inode=\"/tmp/hi

"hive user=hadoop, access=execute, inode=\"/tmp/hive/hive/b68840a5-c25f-4235-b"

第一部分 连接hive

服务器运行模式

虽然可以使用hive与shell交互的方式启动hive
[root@mini1 ~]# cd apps/hive/bin
[root@mini1 bin]# ll
总用量 32
-rwxr-xr-x. 1 root root 1031 4月 30 2015 beeline
drwxr-xr-x. 3 root root 4096 10月 17 12:38 ext
-rwxr-xr-x. 1 root root 7844 5月 8 2015 hive
-rwxr-xr-x. 1 root root 1900 4月 30 2015 hive-config.sh
-rwxr-xr-x. 1 root root 885 4月 30 2015 hiveserver2
-rwxr-xr-x. 1 root root 832 4月 30 2015 metatool
-rwxr-xr-x. 1 root root 884 4月 30 2015 schematool
[root@mini1 bin]# ./hive
hive>

客户端运行模式

但是界面并不好看,而hive也可以发布为服务(Hive thrift服务),然后可以使用hive自带的beeline去连接。如下

窗口1,开启服务

[root@mini1 ~]# cd apps/hive/bin
[root@mini1 bin]# ll
总用量 32
-rwxr-xr-x. 1 root root 1031 4月 30 2015 beeline
drwxr-xr-x. 3 root root 4096 10月 17 12:38 ext
-rwxr-xr-x. 1 root root 7844 5月 8 2015 hive
-rwxr-xr-x. 1 root root 1900 4月 30 2015 hive-config.sh
-rwxr-xr-x. 1 root root 885 4月 30 2015 hiveserver2
-rwxr-xr-x. 1 root root 832 4月 30 2015 metatool
-rwxr-xr-x. 1 root root 884 4月 30 2015 schematool
[root@mini1 bin]# ./hiveserver2

窗口2,作为客户端连接

[root@mini1 bin]# ./beeline
Beeline version 1.2.1 by Apache Hive
beeline> [root@mini1 bin]#
[root@mini1 bin]# ./beeline
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: root
Enter password for jdbc:hive2://localhost:10000: ******
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000>

可能出现错误
Error: Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/tmp":hadoop3:supergroup:drwx------

./hadoop dfs -chmod -R 777 /tmp

第二部分 简单使用

1、查看数据库

  1. 0: jdbc:hive2://localhost:10000> show databases;
  2. +----------------+--+
  3. | database_name |
  4. +----------------+--+
  5. | default |
  6. +----------------+--+
  7. 1 row selected (1.456 seconds)

2、创建并使用数据库,查看表

  1. 0: jdbc:hive2://localhost:10000> create database myhive;
  2. No rows affected (0.576 seconds)
  3. 0: jdbc:hive2://localhost:10000> show databases;
  4. +----------------+--+
  5. | database_name |
  6. +----------------+--+
  7. | default |
  8. | myhive |
  9. +----------------+--+
  10. 0: jdbc:hive2://localhost:10000> use myhive;
  11. No rows affected (0.265 seconds)
  12. 0: jdbc:hive2://localhost:10000> show tables;
  13. +-----------+--+
  14. | tab_name |
  15. +-----------+--+
  16. +-----------+--+

3、创建表

  1. 0: jdbc:hive2://localhost:10000> create table emp(id int,name string);
  2. No rows affected (0.29 seconds)
  3. 0: jdbc:hive2://localhost:10000> show tables;
  4. +-----------+--+
  5. | tab_name |
  6. +-----------+--+
  7. | emp |
  8. +-----------+--+
  9. 1 row selected (0.261 seconds)

上传数据到该目录下,从页面看的话是个目录,如下

image

里面没有文件当然没有数据,那么我们需要上传个文件到该目录下。

  1. [root@mini1 ~]# cat sz.data
  2. 1,zhangsan
  3. 2,lisi
  4. 3,wangwu
  5. 4,furong
  6. 5,fengjie
  7. [root@mini1 ~]# hadoop fs -put sz.data /user/hive/warehouse/myhive.db/emp

再查看

image

4、查看表信息

  1. 0: jdbc:hive2://localhost:10000> select * from emp;
  2. +---------+-----------+--+
  3. | emp.id | emp.name |
  4. +---------+-----------+--+
  5. | NULL | NULL |
  6. | NULL | NULL |
  7. | NULL | NULL |
  8. | NULL | NULL |
  9. | NULL | NULL |
  10. +---------+-----------+--+

结果肯定都是null,因为创建表的时候根本没指定根据”,”来切分,而文件中的字段分隔用了逗号。那么删除该表,重新上传文件,重新建表语句如下

  1. 0: jdbc:hive2://localhost:10000> drop table emp;
  2. No rows affected (1.122 seconds)
  3. 0: jdbc:hive2://localhost:10000> show tables;
  4. +-----------+--+
  5. | tab_name |
  6. +-----------+--+
  7. +-----------+--+
  8. 0: jdbc:hive2://localhost:10000> create table emp(id int,name string)
  9. 0: jdbc:hive2://localhost:10000> row format delimited
  10. 0: jdbc:hive2://localhost:10000> fields terminated by ',';
  11. No rows affected (0.265 seconds)
  12. 0: jdbc:hive2://localhost:10000>
  13. [root@mini1 ~]# hadoop fs -put sz.data /user/hive/warehouse/myhive.db/emp
  14. 0: jdbc:hive2://localhost:10000> select * from emp;
  15. +---------+-----------+--+
  16. | emp.id | emp.name |
  17. +---------+-----------+--+
  18. | 1 | zhangsan |
  19. | 2 | lisi |
  20. | 3 | wangwu |
  21. | 4 | furong |
  22. | 5 | fengjie |
  23. +---------+-----------+--+

6、条件查询

  1. 0: jdbc:hive2://localhost:10000> select id,name from emp where id>2 order by id desc;
  2. INFO : Number of reduce tasks determined at compile time: 1
  3. INFO : In order to change the average load for a reducer (in bytes):
  4. INFO : set hive.exec.reducers.bytes.per.reducer=<number>
  5. INFO : In order to limit the maximum number of reducers:
  6. INFO : set hive.exec.reducers.max=<number>
  7. INFO : In order to set a constant number of reducers:
  8. INFO : set mapreduce.job.reduces=<number>
  9. INFO : number of splits:1
  10. INFO : Submitting tokens for job: job_1508216103995_0004
  11. INFO : The url to track the job: http://mini1:8088/proxy/application_1508216103995_0004/
  12. INFO : Starting Job = job_1508216103995_0004, Tracking URL = http://mini1:8088/proxy/application_1508216103995_0004/
  13. INFO : Kill Command = /root/apps/hadoop-2.6.4/bin/hadoop job -kill job_1508216103995_0004
  14. INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
  15. INFO : 2017-10-18 00:35:39,865 Stage-1 map = 0%, reduce = 0%
  16. INFO : 2017-10-18 00:35:46,275 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
  17. INFO : 2017-10-18 00:35:51,487 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.34 sec
  18. INFO : MapReduce Total cumulative CPU time: 2 seconds 340 msec
  19. INFO : Ended Job = job_1508216103995_0004
  20. +-----+----------+--+
  21. | id | name |
  22. +-----+----------+--+
  23. | 5 | fengjie |
  24. | 4 | furong |
  25. | 3 | wangwu |
  26. +-----+----------+--+
  27. 3 rows selected (18.96 seconds)

看到这就能明白了,写的sql最后是被解析为了mapreduce程序放到yarn上来跑的,hive其实是提供了众多的mapreduce模板。

7、创建外部表

内部表建到固定路径,外部表任意数据
与内部表的区别,删除表时候,外部创建的表所在文件夹不会被删除

  1. 0: jdbc:hive2://localhost:10000> create external table emp2(id int,name string)
  2. 0: jdbc:hive2://localhost:10000> row format delimited fields terminated by ','//指定逗号分割
  3. 0: jdbc:hive2://localhost:10000> stored as textfile//文本存储方式
  4. 0: jdbc:hive2://localhost:10000> location '/company';
  5. No rows affected (0.101 seconds)//存储在/company目录下
  6. 0: jdbc:hive2://localhost:10000> dfs -ls /;
  7. +----------------------------------------------------------------------------------------+--+
  8. | DFS Output |
  9. +----------------------------------------------------------------------------------------+--+
  10. | Found 16 items |
  11. | -rw-r--r-- 2 angelababy mygirls 7 2017-10-01 20:22 /canglaoshi_wuma.avi |
  12. | -rw-r--r-- 2 root supergroup 22 2017-10-03 21:12 /cangmumayi.avi |
  13. | drwxr-xr-x - root supergroup 0 2017-10-18 00:55 /company |
  14. | drwxr-xr-x - root supergroup 0 2017-10-13 04:44 /flowcount |
  15. | drwxr-xr-x - root supergroup 0 2017-10-17 03:44 /friends |
  16. | drwxr-xr-x - root supergroup 0 2017-10-17 06:19 /gc |
  17. | drwxr-xr-x - root supergroup 0 2017-10-07 07:28 /liushishi.log |
  18. | -rw-r--r-- 3 12706 supergroup 60 2017-10-04 21:58 /liushishi.love |
  19. | drwxr-xr-x - root supergroup 0 2017-10-17 07:32 /logenhance |
  20. | -rw-r--r-- 2 root supergroup 26 2017-10-16 20:49 /mapjoin |
  21. | drwxr-xr-x - root supergroup 0 2017-10-16 21:16 /mapjoincache |
  22. | drwxr-xr-x - root supergroup 0 2017-10-13 13:15 /mrjoin |
  23. | drwxr-xr-x - root supergroup 0 2017-10-16 23:35 /reverse |
  24. | drwx------ - root supergroup 0 2017-10-17 13:10 /tmp |
  25. | drwxr-xr-x - root supergroup 0 2017-10-17 13:13 /user |
  26. | drwxr-xr-x - root supergroup 0 2017-10-14 01:33 /wordcount |
  27. +----------------------------------------------------------------------------------------+--+
  28. 0: jdbc:hive2://localhost:10000> create external table emp2(id int,name string)
  29. 0: jdbc:hive2://localhost:10000> row format delimited fields terminated by '\t'
  30. 0: jdbc:hive2://localhost:10000> stored as textfile
  31. 0: jdbc:hive2://localhost:10000> location '/company';
  32. No rows affected (0.135 seconds)
  33. 0: jdbc:hive2://localhost:10000> show tables;
  34. +-----------+--+
  35. | tab_name |
  36. +-----------+--+
  37. | emp |
  38. | emp2 |
  39. | t_sz_ext |
  40. +-----------+--+

能发现多了目录/company和两张表,不过这个时候/company下是没任何东西的。

8、加载文件信息到表中

前面使用了hadoop命令将文件上传到了表对应的目录下,但是也可以在命令行下直接导入文件信息

  1. 0: jdbc:hive2://localhost:10000> load data local inpath '/root/sz.data' into table emp2;(也可以用hadood直接上传,不加local就表示直接从hdfs上传文件)
  2. INFO : Loading data to table myhive.emp2 from file:/root/sz.data
  3. INFO : Table myhive.emp2 stats: [numFiles=0, totalSize=0]
  4. No rows affected (0.414 seconds)
  5. 0: jdbc:hive2://localhost:10000> select * from emp2;
  6. +----------+------------+--+
  7. | emp2.id | emp2.name |
  8. +----------+------------+--+
  9. | 1 | zhangsan |
  10. | 2 | lisi |
  11. | 3 | wangwu |
  12. | 4 | furong |
  13. | 5 | fengjie |
  14. +----------+------------+--+

9、表分区,分区字段为school,导入数据到2个不同的分区中

  1. 0: jdbc:hive2://localhost:10000> create table stu(id int,name string)
  2. 0: jdbc:hive2://localhost:10000> partitioned by(school string)
  3. 0: jdbc:hive2://localhost:10000> row format delimited fields terminated by ',';
  4. No rows affected (0.319 seconds)
  5. 0: jdbc:hive2://localhost:10000> show tables;
  6. +-----------+--+
  7. | tab_name |
  8. +-----------+--+
  9. | emp |
  10. | emp2 |
  11. | stu |
  12. | t_sz_ext |
  13. +-----------+--+
  14. 0: jdbc:hive2://localhost:10000> load data local inpath '/root/sz.data' into table stu partition(school='scu');
  15. INFO : Loading data to table myhive.stu partition (school=scu) from file:/root/sz.data
  16. INFO : Partition myhive.stu{school=scu} stats: [numFiles=1, numRows=0, totalSize=46, rawDataSize=0]
  17. No rows affected (0.607 seconds)
  18. 0: jdbc:hive2://localhost:10000> select * from stu;
  19. +---------+-----------+-------------+--+
  20. | stu.id | stu.name | stu.school |
  21. +---------+-----------+-------------+--+
  22. | 1 | zhangsan | scu |
  23. | 2 | lisi | scu |
  24. | 3 | wangwu | scu |
  25. | 4 | furong | scu |
  26. | 5 | fengjie | scu |
  27. +---------+-----------+-------------+--+
  28. 5 rows selected (0.286 seconds)
  29. 0: jdbc:hive2://localhost:10000> load data local inpath '/root/sz2.data' into table stu partition(school='hfut');
  30. INFO : Loading data to table myhive.stu partition (school=hfut) from file:/root/sz2.data
  31. INFO : Partition myhive.stu{school=hfut} stats: [numFiles=1, numRows=0, totalSize=46, rawDataSize=0]
  32. No rows affected (0.671 seconds)
  33. 0: jdbc:hive2://localhost:10000> select * from stu;
  34. +---------+-----------+-------------+--+
  35. | stu.id | stu.name | stu.school |
  36. +---------+-----------+-------------+--+
  37. | 1 | Tom | hfut |
  38. | 2 | Jack | hfut |
  39. | 3 | Lucy | hfut |
  40. | 4 | Kitty | hfut |
  41. | 5 | Lucene | hfut |
  42. | 6 | Sakura | hfut |
  43. | 1 | zhangsan | scu |
  44. | 2 | lisi | scu |
  45. | 3 | wangwu | scu |
  46. | 4 | furong | scu |
  47. | 5 | fengjie | scu |
  48. +---------+-----------+-------------+--+

注:hive是不遵循三范式的,别去考虑主键了。

10、添加分区

  1. 0: jdbc:hive2://localhost:10000> alter table stu add partition (school='Tokyo');

为了更直观,去页面查看

image

image


 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/404662
推荐阅读
相关标签
  

闽ICP备14008679号