当前位置:   article > 正文

Hive分区表

hive分区表

1、为什么设立分区表

在大量数据需要查询时,虽然可以通过where进行筛选,但也是检索整个数据表后得到的结果。

而将一个大的数据集根据实际需要分割成各个小型数据集,再通过where选择需要查询的分区,

故而效率大大提高。

2、分区表实质

Hive中的分区是将一个文件分割成各个目录(文件)。

3、分区表语法

3.1建立分区表

3.1.1建立一级分区表

create table 分区表名称(

字段名称1 数据类型,

字段名称2 数据类型,

。。。

字段名称n 数据类型

partitioned by(分区字段名称1 数据类型)

row format delimited

fields terminated by '分割符';

collection items terminated by '分割符'

map keys terminated by '分割符'

lines terminated by '分割符'

  1. create table song1(
  2. id int,
  3. name string,
  4. num int
  5. )
  6. partitioned by (month string)
  7. row format delimited fields terminated by '\t';

3.1.2建立二级分区表

create table 分区表名称(

字段名称1 数据类型,

字段名称2 数据类型,

。。。

字段名称n 数据类型

partitioned by(分区字段名称1 数据类型,分区字段名称2 数据类型)

row format delimited

fields terminated by '分割符';

  1. create table singpartition(
  2. id int, name string, num int
  3. )
  4. partitioned by (month string, day string)
  5. row format delimited fields terminated by '\t';

3.2加载数据到分区表

3.2.1加载数据到一级分区表

load data local inpath '本地路径文件' into table 分区表名 partition(分区字段名称1=字段值)

hive> load data local inpath '/opt/songs.txt' into table song1 partition(month=20220108);
hive> load data local inpath '/opt/songs2.txt' into table song1 partition(month=20220109);
hive> load data local inpath '/opt/songs3.txt' into table song1 partition(month=20220110);

 3.2.1加载数据到二级分区表

load data local inpath '本地路径文件' into table 分区表名 partition(分区字段名称1=字段值,分区字段名称2=字段值)

  1. load data local inpath '/opt/songs.txt' into table
  2. singpartition partition(month='20220111',day=13);

3.3分区的增删改查

3.3.1 单分区查询

select 需查询字段 from 表名 where 分区字段名称=字段值

  1. hive> select * from song1 where month=20220110;
  2. OK
  3. 1 never3 3 20220110
  4. 2 over3 2 20220110

  1. hive> select name from song1 where month=20220109;
  2. OK
  3. never2
  4. over2

3.3.2 多分区查询

①:通过union

select 需查询字段 from 表名 where 分区字段名称=字段值1

union

select 需查询字段 from 表名 where 分区字段名称=字段值2 ;

  1. hive> select * from song1 where month='20220109'
  2. > union
  3. > select * from song1 where month='20220110';
  4. WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  5. Query ID = root_20220110232623_3d981c7c-6cee-460d-a564-038775507413
  6. Total jobs = 1
  7. Launching Job 1 out of 1
  8. Number of reduce tasks not specified. Estimated from input data size: 1
  9. In order to change the average load for a reducer (in bytes):
  10. set hive.exec.reducers.bytes.per.reducer=<number>
  11. In order to limit the maximum number of reducers:
  12. set hive.exec.reducers.max=<number>
  13. In order to set a constant number of reducers:
  14. set mapreduce.job.reduces=<number>
  15. Starting Job = job_1641870198202_0002, Tracking URL = http://master:8088/proxy/application_1641870198202_0002/
  16. Kill Command = /opt/software/hadoop-2.7.1/bin/hadoop job -kill job_1641870198202_0002
  17. Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
  18. 2022-01-10 23:26:52,299 Stage-1 map = 0%, reduce = 0%
  19. 2022-01-10 23:27:15,111 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.97 sec
  20. 2022-01-10 23:27:24,753 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.3 sec
  21. MapReduce Total cumulative CPU time: 8 seconds 300 msec
  22. Ended Job = job_1641870198202_0002
  23. MapReduce Jobs Launched:
  24. Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 8.3 sec HDFS Read: 16922 HDFS Write: 213 SUCCESS
  25. Total MapReduce CPU Time Spent: 8 seconds 300 msec
  26. OK
  27. 1 never2 3 20220109
  28. 1 never3 3 20220110
  29. 2 over2 2 20220109
  30. 2 over3 2 20220110

②通过 or

select 需查询字段 from 表名 where 分区字段名称=字段值1 or 分区字段名称=字段值2

  1. hive> select * from song1 where month=20220110 or month=20220108;
  2. OK
  3. 1 never 3 20220108
  4. 2 over 2 20220108
  5. 1 never3 3 20220110
  6. 2 over3 2 20220110

3.3.3查询二级分区数据

语法:select 查询字段 from 表名 where 分区字段1=分区字段1的值 

           and 分区字段2=分区字段2的值;

select * from singpartition where month='20220111' and day=13;

3.3.4增加分区

①增加单个分区

语法:alter table 表名 add partition (分区字段='分区字段值');

hive> alter table song1 add partition(month='20220112');

②同时创建多个分区 add partition (分区字段='分区字段值1') partition (分区字段='分区字段值2');

语法:alter table 表名 

hive> alter table song1 add partition(month='20220111') partition(month='20220106');

 3.3.5删除分区

①删除单个分区

语法:alter table 表名 drop partition(分区字段='分区字段值');

  1. hive> alter table song1 drop partition(month='20220112');
  2. Dropped the partition month=20220112
  3. OK
  4. Time taken: 0.777 seconds

①删除多个分区

语法:alter table 表名 drop partition(分区字段='分区字段值')partition(分区字段='分区字段值2');

  1. alter table song1 drop partition(month='20220111'),partition(month='20220106');
  2. Dropped the partition month=20220106
  3. Dropped the partition month=20220111
  4. OK
  5. Time taken: 0.407 seconds

 3.3.6查询表中分区

语法:show partitions 表名

  1. hive> show partitions song1;
  2. OK
  3. month=20220108
  4. month=20220109
  5. month=20220110

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小桥流水78/article/detail/1005175
推荐阅读
相关标签
  

闽ICP备14008679号