当前位置:   article > 正文

内部表与外部表详解_外部表和内部表

外部表和内部表

内部表&外部表
定义:未被external修饰的是内部表(managed table),被external修饰的为外部表(external table);
区别:

  • 内部表数据由Hive自身管理,外部表数据由HDFS管理;
  • 内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),外部表数据的存储位置由自己制定(如果没有LOCATION,Hive将在HDFS上的/user/hive/warehouse文件夹下以外部表的表名创建一个文件夹,并将属于这个表的数据存放在这里);
  • 删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
  • 对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)

如下,进行试验进行理解

试验理解
创建内部表t1


```sql
create table t1(
    id      int,
    name    string,
    hobby   array<string>,
    add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

在这里插入图片描述

  1. 查看表的描述:desc t1;
    这里写图片描述

装载数据(t1)

注:一般很少用insert (不是insert overwrite)语句,因为就算就算插入一条数据,也会调用MapReduce,这里我们选择Load Data的方式。

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
  • 1

1
创建一个文件粘贴上述记录,并上载即可,如下图:
这里写图片描述

文件内容如下

1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin
  • 1
  • 2
  • 3

然后上载

load data local inpath '/home/hadoop/Desktop/data' overwrite into table t1;
  • 1

别忘记写文件名/data,笔者第一次忘记写,把整个Desktop上传了,一查全是null和乱码。
查看表内容:

select * from t1;
  • 1

1这里写图片描述

创建一个外部表t2

create external table t2(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/t2'
;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

这里写图片描述

装载数据(t2)

load data local inpath '/home/hadoop/Desktop/data' overwrite into table t2;
  • 1

1这里写图片描述

查看文件位置
如下图,我们在NameNode:50070/explorer.html#/user/目录下,可以看到t2文件
这里写图片描述

t1在哪呢?在我们之前配置的默认路径里
这里写图片描述

同样我们可以通过命令行获得两者的位置信息:

desc formatted table_name;
  • 1

1这里写图片描述

这里写图片描述

注:图中managed table就是内部表,而external table就是外部表。
##分别删除内部表和外部表
下面分别删除内部表和外部表,查看区别
这里写图片描述

观察HDFS上的文件
发现t1已经不存在了
这里写图片描述

但是t2仍然存在
这里写图片描述

因而外部表仅仅删除元数据

重新创建外部表t2

create external table t2(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/t2'
;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

不往里面插入数据,我们select * 看看结果
这里写图片描述

可见数据仍然在!!!

官网解释
以下是官网中关于external表的介绍:

A table created without the EXTERNAL clause is called a managed table because Hive manages its data. Managed and External Tables By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The
default location can be overridden by the location property during
table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration. Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables. An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information. Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped. Managed or external tables can be identified using the DESCRIBE FORMATTED table_name command, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on table type. Statistics can be managed on internal and external tables and partitions for query optimization.

Hive官网介绍:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/Column

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/693666
推荐阅读
相关标签
  

闽ICP备14008679号