当前位置:   article > 正文

大数据时代--Hive数据操作之insert操作_hive sql 的insert

hive sql 的insert

前面文章中讲解了Hive的数据定义语言,接下来就是数据操作语言,数据定义语言主要包含了Hive表结构的一些操作:包含创建表、删除表、修改表的结构、增加列等等一系列操作。数据操作语言主要是对表中的数据的增删改查。


数据插入


通过查询语句进行插入,这里不像是MySql可以一条一条的进行插入操作,在Hive中我试了几次都没有成功,搜了下也都是通过select语句进行插入的。这里也算是和普通的sql不同的地方吧。

  1. hive> insert overwrite table salaries select * from salaries_external where year = 2012;
  2. FAILED: SemanticException [Error 10004]: Line 1:70 Invalid table alias or column reference 'year': (pos ible column names are: yearid, teamid, lgid, playerid, salary)
  3. hive> insert overwrite table salaries select * from salaries_external where yearid = 2012;
  4. Total jobs = 3
  5. Launching Job 1 out of 3
  6. Number of reduce tasks is set to 0 since there's no reduce operator
  7. Starting Job = job_1474609415284_0001, Tracking URL = http://hadoopwy1:8088/proxy/application_147460941 284_0001/
  8. Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1474609415284_0001
  9. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
  10. 2016-09-23 03:59:05,119 Stage-1 map = 0%, reduce = 0%
  11. 2016-09-23 03:59:33,232 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.13 sec
  12. MapReduce Total cumulative CPU time: 2 seconds 130 msec
  13. Ended Job = job_1474609415284_0001
  14. Stage-4 is selected by condition resolver.
  15. Stage-3 is filtered out by condition resolver.
  16. Stage-5 is filtered out by condition resolver.
  17. Moving data to: hdfs://hadoopnodeservice1/tmp/hive-root/hive_2016-09-23_03-57-59_773_846459559745817905 -1/-ext-10000
  18. Loading data to table default.salaries
  19. rmr: DEPRECATED: Please use 'rm -r' instead.
  20. Deleted hdfs://hadoopnodeservice1/user/hive/warehouse/salaries
  21. Table default.salaries stats: [numFiles=1, numRows=1696, totalSize=52808, rawDataSize=51112]
  22. MapReduce Jobs Launched:
  23. Job 0: Map: 1 Cumulative CPU: 2.13 sec HDFS Read: 1354022 HDFS Write: 52885 SUCCESS
  24. Total MapReduce CPU Time Spent: 2 seconds 130 msec
  25. OK
  26. Time taken: 98.887 seconds

看中间的过程看到,存在一个Mapper的过程,没有启动Reduce过程。部分数据如下:

  1. 2012 WAS NL carrobr01 580000.0
  2. 2012 WAS NL desmoia01 512500.0
  3. 2012 WAS NL espinda01 506000.0
  4. 2012 WAS NL storedr01 498750.0
  5. 2012 WAS NL bernaro01 493500.0
  6. 2012 WAS NL ramoswi01 491250.0
  7. 2012 WAS NL rodrihe03 491000.0
  8. 2012 WAS NL detwiro01 485000.0
  9. 2012 WAS NL stammcr01 485000.0
  10. 2012 WAS NL marrech01 481000.0
  11. 2012 WAS NL matthry01 481000.0
  12. 2012 WAS NL lombast02 481000.0
  13. Time taken: 0.077 seconds, Fetched: 1696 row(s)


插入数据并指定其分区

该命令要求我们创建表的时候,就已经按照某种形式分区了。

  1. hive> describe salaries_partition;
  2. OK
  3. teamid string
  4. playerid string
  5. lgid string
  6. salary float
  7. yearid int
  8. # Partition Information
  9. # col_name data_type comment
  10. yearid int
  11. Time taken: 0.122 seconds, Fetched: 10 row(s)

在这里我按照年进行了分区,在下面的插入语句中,注意yearid不需要你查询出来的,因为在上面的表中已经告诉了他按照yearid进行分区的,如果再查询出yearid的话会抛出异常:说你的表中是4列,而查询出来的数据却有五列,没办法给你对应呀。

  1. hive> insert overwrite table salaries_partition partition(yearid = 2012) select ex.teamid, ex.playerid, ex.lgid, ex.salary from salaries_external ex where ex.yearid = 2012;
  2. Total jobs = 3
  3. Launching Job 1 out of 3
  4. Number of reduce tasks is set to 0 since there's no reduce operator
  5. Starting Job = job_1474609415284_0005, Tracking URL = http://hadoopwy1:8088/proxy/application_1474609415284_0005/
  6. Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1474609415284_0005
  7. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
  8. 2016-09-23 05:10:09,350 Stage-1 map = 0%, reduce = 0%
  9. 2016-09-23 05:10:25,157 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.79 sec
  10. MapReduce Total cumulative CPU time: 1 seconds 790 msec
  11. Ended Job = job_1474609415284_0005
  12. Stage-4 is selected by condition resolver.
  13. Stage-3 is filtered out by condition resolver.
  14. Stage-5 is filtered out by condition resolver.
  15. Moving data to: hdfs://hadoopnodeservice1/tmp/hive-root/hive_2016-09-23_05-09-55_703_8904828298967067898-1/-ext-10000
  16. Loading data to table default.salaries_partition partition (yearid=2012)
  17. Partition default.salaries_partition{yearid=2012} stats: [numFiles=1, numRows=1696, totalSize=44328, rawDataSize=42632]
  18. MapReduce Jobs Launched:
  19. Job 0: Map: 1 Cumulative CPU: 1.79 sec HDFS Read: 1354022 HDFS Write: 44427 SUCCESS
  20. Total MapReduce CPU Time Spent: 1 seconds 790 msec
  21. OK
  22. Time taken: 32.188 seconds

动态分区插入

  1. hive> insert overwrite table salaries_partition partition(yearid) select ex.teamid, ex.playerid, ex.lgid, ex.salary from salaries_external ex;
  2. FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
这里出现了异常,说是现在是strict状态,不允许你这样做,你需要把strict状态改成nonstrict状态才行。


这里设置了strict模式后仍然是会出现问题的。需要设置以下三个参数。

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;

  1. <strong>hive> set hive.exec.max.dynamic.partitions.pernode=3000;
  2. hive> insert overwrite table salaries_partition partition(yearid) select ex.yearid, ex.teamid, ex.playerid, ex.lgid, ex.salary from salaries_external ex;
  3. Total jobs = 1
  4. Launching Job 1 out of 1
  5. Number of reduce tasks not specified. Estimated from input data size: 1
  6. In order to change the average load for a reducer (in bytes):
  7. set hive.exec.reducers.bytes.per.reducer=<number>
  8. In order to limit the maximum number of reducers:
  9. set hive.exec.reducers.max=<number>
  10. In order to set a constant number of reducers:
  11. set mapreduce.job.reduces=<number>
  12. Starting Job = job_1474609415284_0007, Tracking URL = http://hadoopwy1:8088/proxy/application_1474609415284_0007/
  13. Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1474609415284_0007
  14. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
  15. 2016-09-23 06:20:01,525 Stage-1 map = 0%, reduce = 0%
  16. 2016-09-23 06:20:32,504 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.1 sec
  17. 2016-09-23 06:20:55,706 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 5.39 sec
  18. 2016-09-23 06:20:58,864 Stage-1 map = 100%, reduce = 70%, Cumulative CPU 6.02 sec
  19. 2016-09-23 06:21:02,023 Stage-1 map = 100%, reduce = 72%, Cumulative CPU 6.53 sec
  20. 2016-09-23 06:21:05,182 Stage-1 map = 100%, reduce = 73%, Cumulative CPU 6.99 sec
  21. 2016-09-23 06:21:08,338 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 7.61 sec
  22. 2016-09-23 06:21:10,432 Stage-1 map = 100%, reduce = 76%, Cumulative CPU 8.12 sec
  23. 2016-09-23 06:21:13,578 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 8.57 sec
  24. 2016-09-23 06:21:16,729 Stage-1 map = 100%, reduce = 78%, Cumulative CPU 9.08 sec
  25. 2016-09-23 06:21:19,883 Stage-1 map = 100%, reduce = 79%, Cumulative CPU 9.55 sec
  26. 2016-09-23 06:21:23,085 Stage-1 map = 100%, reduce = 80%, Cumulative CPU 10.01 sec
  27. 2016-09-23 06:21:26,218 Stage-1 map = 100%, reduce = 81%, Cumulative CPU 10.45 sec
  28. 2016-09-23 06:21:32,505 Stage-1 map = 100%, reduce = 82%, Cumulative CPU 11.36 sec
  29. 2016-09-23 06:21:37,774 Stage-1 map = 100%, reduce = 83%, Cumulative CPU 12.21 sec
  30. 2016-09-23 06:21:44,050 Stage-1 map = 100%, reduce = 84%, Cumulative CPU 13.06 sec
  31. 2016-09-23 06:21:47,205 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 13.48 sec
  32. 2016-09-23 06:21:50,352 Stage-1 map = 100%, reduce = 86%, Cumulative CPU 13.92 sec
  33. 2016-09-23 06:21:53,538 Stage-1 map = 100%, reduce = 87%, Cumulative CPU 14.3 sec
  34. 2016-09-23 06:21:56,694 Stage-1 map = 100%, reduce = 88%, Cumulative CPU 14.72 sec
  35. 2016-09-23 06:21:59,850 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 15.15 sec
  36. 2016-09-23 06:22:05,074 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 16.09 sec
  37. 2016-09-23 06:22:08,220 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 16.51 sec
  38. 2016-09-23 06:22:14,509 Stage-1 map = 100%, reduce = 92%, Cumulative CPU 17.33 sec
  39. 2016-09-23 06:22:17,665 Stage-1 map = 100%, reduce = 93%, Cumulative CPU 17.77 sec
  40. 2016-09-23 06:22:20,830 Stage-1 map = 100%, reduce = 94%, Cumulative CPU 18.24 sec
  41. 2016-09-23 06:22:26,128 Stage-1 map = 100%, reduce = 95%, Cumulative CPU 19.1 sec
  42. 2016-09-23 06:22:29,284 Stage-1 map = 100%, reduce = 96%, Cumulative CPU 19.42 sec
  43. 2016-09-23 06:22:35,616 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 20.3 sec
  44. 2016-09-23 06:22:41,909 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 21.12 sec
  45. 2016-09-23 06:22:48,204 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 21.91 sec
  46. 2016-09-23 06:22:59,670 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 23.45 sec
  47. 2016-09-23 06:24:00,528 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 23.88 sec
  48. MapReduce Total cumulative CPU time: 23 seconds 880 msec
  49. Ended Job = job_1474609415284_0007
  50. Loading data to table default.salaries_partition partition (yearid=null)
  51. Failed with exception Number of dynamic partitions created is 2881, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 2881.
  52. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
  53. MapReduce Jobs Launched:
  54. Job 0: Map: 1 Reduce: 1 Cumulative CPU: 28.11 sec HDFS Read: 1354022 HDFS Write: 1191371 SUCCESS
  55. Total MapReduce CPU Time Spent: 28 seconds 110 msec</strong>


导出数据

将数据导出到文件中。

  1. hive> insert overwrite local directory '/usr/local/temp'
  2. > select ex.yearid, ex.teamid, ex.playerid, ex.salary
  3. > from salaries_external ex;
  4. Total jobs = 1
  5. Launching Job 1 out of 1
  6. Number of reduce tasks is set to 0 since there's no reduce operator
  7. Starting Job = job_1474609415284_0008, Tracking URL = http://hadoopwy1:8088/proxy/application_1474609415284_0008/
  8. Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1474609415284_0008
  9. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
  10. 2016-09-23 06:54:25,754 Stage-1 map = 0%, reduce = 0%
  11. 2016-09-23 06:54:42,450 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.56 sec
  12. MapReduce Total cumulative CPU time: 1 seconds 560 msec
  13. Ended Job = job_1474609415284_0008
  14. Copying data to local directory /usr/local/temp
  15. Copying data to local directory /usr/local/temp
  16. MapReduce Jobs Launched:
  17. Job 0: Map: 1 Cumulative CPU: 1.56 sec HDFS Read: 1354022 HDFS Write: 1302986 SUCCESS
  18. Total MapReduce CPU Time Spent: 1 seconds 560 msec
  19. OK
  20. Time taken: 44.155 seconds




声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/396242
推荐阅读
相关标签
  

闽ICP备14008679号