赞
踩
Hive中有多种修改数据的方式:
使用Load的方式,数据不会有转换动作。Load操作只是单纯的将数据从原目录移动到目标Hive表的对应数据目录。
语法:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
说明:Hive3.0之前的Hive Load操作是纯移动操作,用于将数据文件移动到与Hive表相对应的数据目录。
project/data1
/user/hive/warehouse/test_table
hdfs://namenode:9000/user/hive/project/data1
LOCAL
,那么:
/user/hive/project/data1
和file:///user/hive/project/data1
LOCAL
,那么Hive要么使用filepath的完整URI,要么遵循以下规则:
fs.default.name
作为系统前缀来指定完整URI./user/<username>
OVERWRITE
,数据加载方式为覆盖,否则为追加。语法:
# 标准语法: INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement; INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; # Hive 延伸 (多个inserts): FROM from_statement INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...; FROM from_statement INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...; # Hive 延伸 (动态分区入库): INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement; INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT OVERWRITE
覆盖表或分区INSERT INTO
追加动态分区配置属性 | 默认值 | 备注 |
---|---|---|
hive.error.on.empty.partition | false | 动态分区入库生成空结果时,是否抛异常 |
hive.exec.dynamic.partition | true | true 表示开启动态分区入库 |
hive.exec.dynamic.partition.mode | strict | strict模式下,用户必须制定至少一个静态分区,防止用户以外覆盖了所有分区。nonstrict模式下所有分区都允许被动态更新 |
hive.exec.max.created.files | 100000 | MapReduce作业中所有mappers/reducers创建的HDFS文件的最大数目 |
hive.exec.max.dynamic.partitions | 1000 | 允许被创建的动态分区数目的最大值 |
hive.exec.max.dynamic.partitions.pernode | 100 | 在每个mappers/reducers中允许被创建的动态分区数目的最大值 |
Example:
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
SELECT t.viewTime, t.userid, t.page_url, t.referrer_url, null, null, t.ip, t.cnt
FROM page_view_stg t
INSERT OVERWRITE DIRECTORY ‘/hivetmp’ ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ SELECT empno,ename FROM emp;
语法:
# 标准语法: INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0) SELECT ... FROM ... # Hive 延伸 (多个inserts): FROM from_statement INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
Example:
INSERT OVERWRITE LOCAL DIRECTORY '/hivetmp'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
SELECT empno,ename FROM emp;
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。