当前位置:   article > 正文

Hive表的定义,删除、修改_hive外部表 删除字段

hive外部表 删除字段

Hive表的定义、删除

创建表

只涉及简单的建表,不涉及分区等复杂操作。

1. 建表语句
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)

primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)

array_type
  : ARRAY < data_type >

map_type
  : MAP < primitive_type, data_type >

struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)

row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | JSONFILE    -- (Note: Available in Hive 4.0.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
2. row_format说明
  • 功能说明:用于指定相关分隔符
  • row_format
FIELDS TERMINATED BY  '$'			/*使用$作为字段分隔符*/
MAP KEYS TERMINATED BY '$'  		/*使用$作为map解析数据时的key value分隔符*/
LINES TERMINATED BY '$'				/*使用$作为行与行分隔符*/
COLLECTION ITEMS TERMINATED BY '$'	/*使用$作为 Array 中的各元素、 Struct 中的各元素、各个map之间的分隔符*/
  • 1
  • 2
  • 3
  • 4
3. file_format说明
  • 功能说明:指定HDFS文件存放的格式
4. 外部表说明(EXTERNAL)
  • 功能说明:使用EXTERNAL声明一个表为外部表
  • 使用示例
CREATE EXTERNAL TABLE rowtest(row int);
  • 1
  • 与内部表区别
    • 内部表数据由Hive自身管理,外部表数据由HDFS管理;
    • 删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
    • 对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)
5. LOCATION
  • 功能说明:指定表的存储位置。
  • 注意
    • 内部表可以不指定改参数,内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse)
    • 外部表数据的存储位置由自己制定;

删除表

1. 删除语句
DROP TABLE [IF EXISTS] table_name [PURGE];
PURGE: 直接删除,不移入回收站
  • 1
  • 2
2. 注意事项
  • 如果.Trash/Current配置,且PURGE未指定时,该语句删除时会将该表放入.Trash/Current 目录

例子

内部表

假定HDFS文件/zpy/test/hive/i_row.txt文件格式如下:

abc,zhao-py
qwe,zh-wei
  • 1
  • 2
  • 建立内部表
# 以下可用
create table zhaopy.irowtest(
    word string,
    love array<string>
) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
LOCATION '/zpy/hive/i';
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 加载内部表数据

load命令会将文件移动到rowtest指定的存储位置.

load data inpath '/zpy/test/hive/i_row.txt' into table zhaopy.irowtest;
  • 1
  • 查看内部表数据
hive> select * from zhaopy.irowtest;
OK
abc     ["zhao","py"]
qwe     ["zh","wei"]
Time taken: 0.125 seconds, Fetched: 2 row(s)
  • 1
  • 2
  • 3
  • 4
  • 5
外部表

假定HDFS文件/zpy/test/hive/f_row.txt文件格式如下:

abc,zhao-py
qwe,zh-wei
  • 1
  • 2
  • 建立外部表
# 以下可用
create EXTERNAL table zhaopy.frowtest(
    word string,
    love array<string>
) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
LOCATION '/zpy/hive/f';
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 加载外部表数据

load命令会将文件移动到rowtest指定的存储位置.

load data inpath '/zpy/test/hive/f_row.txt' into table zhaopy.frowtest;
  • 1
  • 查看外部表数据
hive> select * from zhaopy.frowtest;
OK
abc     ["zhao","py"]
qwe     ["zh","wei"]
Time taken: 0.125 seconds, Fetched: 2 row(s)
  • 1
  • 2
  • 3
  • 4
  • 5
查看两个表存储位置
[ochadoop@server7 hivetest]$ hadoop fs -ls /zpy/hive/i
Found 1 items
-rwxrwxrwx   3 ochadoop ochadoop         23 2018-09-04 16:44 /zpy/hive/i/i_row.txt
[ochadoop@server7 hivetest]$ hadoop fs -ls /zpy/hive/f
Found 1 items
-rwxrwxrwx   3 ochadoop ochadoop         23 2018-09-04 16:44 /zpy/hive/f/f_row.txt
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
删除两个表,查看两者存储位置的文件是否一并被删除
  • 删除表
drop table zhaopy.irowtest;
drop table zhaopy.frowtest;
  • 1
  • 2
  • 查看目录
[ochadoop@server7 hivetest]$ hadoop fs -ls /zpy/hive/i
ls: `/zpy/hive/i': No such file or directory
[ochadoop@server7 hivetest]$ hadoop fs -ls /zpy/hive/f
Found 1 items
-rwxrwxrwx   3 ochadoop ochadoop         23 2018-09-04 16:44 /zpy/hive/f/f_row.txt
  • 1
  • 2
  • 3
  • 4
  • 5

此处存在一个问题,collection与map keys同时定义出错,原因未知

表的修改

1. 表级别的修改
  • 表的重命名
ALTER TABLE table_name RENAME TO new_table_name;
  • 1
  • 表的元数据添加/修改
ALTER TABLE table_name SET TBLPROPERTIES table_properties;

table_properties:
  : (property_name = property_value, property_name = property_value, ... )
  • 1
  • 2
  • 3
  • 4

元数据分为用户自定义和hive预定义两种,系统预定义见:系统预定义。Hive元数据见:Hive元数据,hive元数据的概念及其查看见:Hive元数据概念及其查看

  • 表的序列化/反序列化修改
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];

ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;

serde_properties:
  : (property_name = property_value, property_name = property_value, ... )
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
// 使用举例:更改数据库表的字段分隔符
ALTER TABLE table_name SET SERDEPROPERTIES ('field.delim' = ',');
  • 1
  • 2
  • 表的存储属性修改
ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
  INTO num_buckets BUCKETS;
  • 1
  • 2

表的分区修改

表的列级修改
  • 列的更改
    用法与SQL中的相似。使用例子摘自官网
ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

// example 
CREATE TABLE test_change (a int, b int, c int);

// First change column a's name to a1.
ALTER TABLE test_change CHANGE a a1 INT;

// Next change column a1's name to a2, its data type to string, and put it after column b.
ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
// The new table's structure is:  b int, a2 string, c int.

// Then change column c's name to c1, and put it as the first column.
ALTER TABLE test_change CHANGE c c1 INT FIRST;
// The new table's structure is:  c1 int, b int, a2 string.

// Add a comment to column a1
ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 列的添加或替换
ALTER TABLE table_name 
  [PARTITION partition_spec]                 -- (Note: Hive 0.14.0 and later)
  ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
  [CASCADE|RESTRICT]                         -- (Note: Hive 1.1.0 and later)
  • 1
  • 2
  • 3
  • 4

使用示例

// 给student表添加一个新列class
ALTER TABLE student ADD COLUMNs (class string COMMENT 'class');
// 由于hive无法针对列进行删除,因此我们采用replace替换。
// 假若原始表有三个字段,按顺序为id-name-class,假设我们要删除name,并替换id与class位置,可以使用如下SQL
ALTER TABLE student REPLACE COLUMNS (class string, id int);
  • 1
  • 2
  • 3
  • 4
  • 5
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/718830
推荐阅读
相关标签
  

闽ICP备14008679号