当前位置:   article > 正文

Hive表数据的查询_hive sql 查看表数据是否存在

hive sql 查看表数据是否存在

文中的两张表emp和dept:

emp数据如下:

  1. 7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 30
  2. 7521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 30
  3. 7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20
  4. 7654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30
  5. 7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30
  6. 7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10
  7. 7788 SCOTT ANALYST 7566 1987-07-13 3000.0 0.0 20
  8. 7839 KING PRESIDENT NULL 1981-11-07 5000.0 0.0 10
  9. 7844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 30
  10. 7876 ADAMS CLERK 7788 1987-07-13 1100.0 0.0 20
  11. 7900 JAMES CLERK 7698 1981-12-03 950.0 0.0 30
  12. 7902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 20
  13. 7934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 10

dept的数据如下:

  1. 10 ACCOUNTING NEW YORK
  2. 20 RESEARCH DALLAS
  3. 30 SALES CHICAGO
  4. 40 OPERATIONS BOSTON

 1、where用法:

  1. hive (db_emp)> select * from emp where sal>3000;
  2. OK
  3. emp.empno emp.enname emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  4. 7839 KING PRESIDENT NULL 1981-11-07 5000.0 0.0 10
  5. Time taken: 0.891 seconds, Fetched: 1 row(s)

 

2、order用法:查询部门10的所有员工,按照薪资进行降序排列:

select * from emp where deptno='10' order by sal desc;

结果:

  1. hive (db_emp)> select * from emp where deptno='10' order by sal desc;
  2. Query ID = root_20180725102525_ff119157-9589-4736-8780-4310923516fd
  3. Total jobs = 1
  4. Launching Job 1 out of 1
  5. Number of reduce tasks determined at compile time: 1
  6. In order to change the average load for a reducer (in bytes):
  7. set hive.exec.reducers.bytes.per.reducer=<number>
  8. In order to limit the maximum number of reducers:
  9. set hive.exec.reducers.max=<number>
  10. In order to set a constant number of reducers:
  11. set mapreduce.job.reduces=<number>
  12. Starting Job = job_1532453698068_0003, Tracking URL = http://master.cdh.com:8088/proxy/application_1532453698068_0003/
  13. Kill Command = /opt/cdh5.14.2/hadoop-2.6.0/bin/hadoop job -kill job_1532453698068_0003
  14. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
  15. 2018-07-25 10:25:56,776 Stage-1 map = 0%, reduce = 0%
  16. 2018-07-25 10:26:09,404 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.72 sec
  17. 2018-07-25 10:26:18,240 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.19 sec
  18. MapReduce Total cumulative CPU time: 5 seconds 190 msec
  19. Ended Job = job_1532453698068_0003
  20. MapReduce Jobs Launched:
  21. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.19 sec HDFS Read: 10580 HDFS Write: 154 SUCCESS
  22. Total MapReduce CPU Time Spent: 5 seconds 190 msec
  23. OK
  24. emp.empno emp.enname emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  25. 7839 KING PRESIDENT NULL 1981-11-07 5000.0 0.0 10
  26. 7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10
  27. 7934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 10
  28. Time taken: 38.628 seconds, Fetched: 3 row(s)

3、limit用法:

  1. hive (db_emp)> select * from emp limit 5;
  2. OK
  3. emp.empno emp.enname emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  4. 7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 30
  5. 7521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 30
  6. 7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20
  7. 7654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30
  8. 7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30
  9. Time taken: 0.162 seconds, Fetched: 5 row(s)

4、distinct用法:去除查询结果中的重复记录

select distinct deptno from emp;

结果:

  1. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.51 sec HDFS Read: 8303 HDFS Write: 9 SUCCESS
  2. Total MapReduce CPU Time Spent: 4 seconds 510 msec
  3. OK
  4. deptno
  5. 10
  6. 20
  7. 30
  8. Time taken: 32.66 seconds, Fetched: 3 row(s

5、between and用法:

  1. hive (db_emp)> select * from emp where sal between 2000 and 3000;
  2. OK
  3. emp.empno emp.enname emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  4. 7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20
  5. 7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30
  6. 7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10
  7. 7788 SCOTT ANALYST 7566 1987-07-13 3000.0 0.0 20
  8. 7902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 20
  9. Time taken: 0.117 seconds, Fetched: 5 row(s)

6、count用法:

  1. select count(1) from emp;
  2. 或者
  3. select count(*) from emp;(效率较低)

7、avg用法:求平均数

select avg(sal) avg_sal from emp;

结果:

  1. Total MapReduce CPU Time Spent: 5 seconds 380 msec
  2. OK
  3. avg_sal
  4. 2171.153846153846
  5. Time taken: 31.874 seconds, Fetched: 1 row(s)

8、group by用法:

select deptno avg(sal) from emp group by deptno;

结果:

  1. Total MapReduce CPU Time Spent: 2 seconds 730 msec
  2. OK
  3. deptno _c1
  4. 10 2916.6666666666665
  5. 20 2518.75
  6. 30 1566.6666666666667
  7. Time taken: 28.437 seconds, Fetched: 3 row(s)

9、having用法:跟where相似

select deptno avg(sal) avg_sal from emp group by deptno having avg_sal >2000;

结果:

  1. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.41 sec HDFS Read: 9666 HDFS Write: 33 SUCCESS
  2. Total MapReduce CPU Time Spent: 4 seconds 410 msec
  3. OK
  4. deptno avg_sal
  5. 10 2916.6666666666665
  6. 20 2518.75
  7. Time taken: 59.368 seconds, Fetched: 2 row(s)

 where和having的区别:

正确理解MySQL中的where和having的区别

10、join的用法:等值join,左join,右join,全join

等值join:

select e.empno, e.enname, e.sal, d.deptno from emp e join dept d on e.deptno=d.deptno;

结果:

  1. Stage-Stage-3: Map: 1 Cumulative CPU: 1.8 sec HDFS Read: 7480 HDFS Write: 285 SUCCESS
  2. Total MapReduce CPU Time Spent: 1 seconds 800 msec
  3. OK
  4. e.empno e.enname e.sal d.deptno
  5. 7499 ALLEN 1600.0 30
  6. 7521 WARD 1250.0 30
  7. 7566 JONES 2975.0 20
  8. 7654 MARTIN 1250.0 30
  9. 7698 BLAKE 2850.0 30
  10. 7782 CLARK 2450.0 10
  11. 7788 SCOTT 3000.0 20
  12. 7839 KING 5000.0 10
  13. 7844 TURNER 1500.0 30
  14. 7876 ADAMS 1100.0 20
  15. 7900 JAMES 950.0 30
  16. 7902 FORD 3000.0 20
  17. 7934 MILLER 1300.0 10
  18. Time taken: 27.963 seconds, Fetched: 13 row(s)

左join:

select e.empno, e.enname, e.sal, d.deptno from emp e left join dept d on e.deptno=d.deptno;

由于数据的原因,结果同上。

右join:

select e.empno, e.enname, e.sal, d.deptno from emp e right join dept d on e.deptno=d.deptno;

结果:

  1. Stage-Stage-3: Map: 1 Cumulative CPU: 1.71 sec HDFS Read: 6727 HDFS Write: 297 SUCCESS
  2. Total MapReduce CPU Time Spent: 1 seconds 710 msec
  3. OK
  4. e.empno e.enname e.sal d.deptno
  5. 7782 CLARK 2450.0 10
  6. 7839 KING 5000.0 10
  7. 7934 MILLER 1300.0 10
  8. 7566 JONES 2975.0 20
  9. 7788 SCOTT 3000.0 20
  10. 7876 ADAMS 1100.0 20
  11. 7902 FORD 3000.0 20
  12. 7499 ALLEN 1600.0 30
  13. 7521 WARD 1250.0 30
  14. 7654 MARTIN 1250.0 30
  15. 7698 BLAKE 2850.0 30
  16. 7844 TURNER 1500.0 30
  17. 7900 JAMES 950.0 30
  18. NULL NULL NULL 40
  19. Time taken: 31.193 seconds, Fetched: 14 row(s)

 

全join:

select e.empno, e.enname, e.sal, d.deptno from emp e full join dept d on e.deptno=d.deptno;

由于数据的原因,结果同上。

12、导出数据到本地:

insert overwrite local directory '/opt/data/emp' row format delimited fields terminated by '\t' select * from emp;

结果:

在本地的路径 /opt/data/emp/:

  1. [root@master emp]# more 000000_0
  2. 7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 30
  3. 7521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 30
  4. 7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20
  5. 7654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30
  6. 7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30
  7. 7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10
  8. 7788 SCOTT ANALYST 7566 1987-07-13 3000.0 0.0 20
  9. 7839 KING PRESIDENT \N 1981-11-07 5000.0 0.0 10
  10. 7844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 30
  11. 7876 ADAMS CLERK 7788 1987-07-13 1100.0 0.0 20
  12. 7900 JAMES CLERK 7698 1981-12-03 950.0 0.0 30
  13. 7902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 20
  14. 7934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 10

13 、sort by

先设置reduce的个数:

  1. hive (db_emp)> set mapreduce.job.reduces=3;
  2. hive (db_emp)> set mapreduce.job.reduces;
  3. mapreduce.job.reduces=3

执行sort by:

insert overwrite local directory '/opt/data/emp_sort' row format delimited fields terminated by '\t' select * from emp sort by sal;

结果:

在本地路径/opt/data/emp_sort下产生3个文件:

3个文件的内容如下:每个reduce都按照sal进行排序

  1. [root@master data]# cd emp_sort/
  2. [root@master emp_sort]# cat 000000_0
  3. 7876 ADAMS CLERK 7788 1987-07-13 1100.0 0.0 20
  4. 7844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 30
  5. 7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10
  6. 7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30
  7. 7788 SCOTT ANALYST 7566 1987-07-13 3000.0 0.0 20
  8. 7839 KING PRESIDENT \N 1981-11-07 5000.0 0.0 10
  9. [root@master emp_sort]# cat 000001_0
  10. 7900 JAMES CLERK 7698 1981-12-03 950.0 0.0 30
  11. 7521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 30
  12. 7654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30
  13. 7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20
  14. 7902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 20
  15. [root@master emp_sort]# cat 000002_0
  16. 7934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 10
  17. 7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 30

14、distribute by

insert overwrite local directory '/opt/data/emp_dist' row format delimited fields terminated by '\t' select * from emp distribute by deptno sort by sal;

 查看本地路径:

数据结果如下:按照deptno进行分组后根据sal进行排序

  1. [root@master data]# cd emp_dist/
  2. [root@master emp_dist]# more 000000_0
  3. 7900 JAMES CLERK 7698 1981-12-03 950.0 0.0 30
  4. 7521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 30
  5. 7654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30
  6. 7844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 30
  7. 7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 30
  8. 7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30
  9. [root@master emp_dist]# more 000001_0
  10. 7934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 10
  11. 7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10
  12. 7839 KING PRESIDENT \N 1981-11-07 5000.0 0.0 10
  13. [root@master emp_dist]# more 000002_0
  14. 7876 ADAMS CLERK 7788 1987-07-13 1100.0 0.0 20
  15. 7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20
  16. 7902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 20
  17. 7788 SCOTT ANALYST 7566 1987-07-13 3000.0 0.0 20

15、cluster by

insert overwrite local directory '/opt/data/emp_clu' row format delimited fields terminated by '\t' select * from emp cluster by sal;

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/969541
推荐阅读
相关标签
  

闽ICP备14008679号