当前位置:   article > 正文

flink1.14.4 + iceberg0.13.2 + Hive2.1.1 + presto0.273.3 + yanagishima 18.0 + FineBI5.1.24 (三)_iceberg集成hive-2.1.1

iceberg集成hive-2.1.1

Iceberg 整合 Hive

把iceberg连接hive的包拷贝到hive的lib目录及auxlib目录

 cp /home/bonc/iceberg-hive-runtime-0.13.2.jar /opt/cloudera/parcels/CDH/lib/hive/auxlib
 cp /home/bonc/iceberg-hive-runtime-0.13.2.jar /opt/cloudera/parcels/CDH/lib/hive/lib/
  • 1
  • 2

1. 开启Iceberg的支持

hive> add jar /home/bonc/iceberg-hive-runtime-0.13.2.jar;
hive> set iceberg.engine.hive.enabled=true;
  • 1
  • 2

也可以配置到hive-site.xml

<property>
        <name>iceberg.engine.hive.enabled</name>
        <value>true</value>
        <description>Hive是否开启Iceberg的支持</description>
</property>
  • 1
  • 2
  • 3
  • 4
  • 5

2. Catalog管理

Hive本身没有Catalog的概念,但是Iceberg有Catalog。所以Hive将Catalog的信息用键值对的属性来实现,这样建表的时候就可以直接使用创建的Catalog

Hive集成Iceberg支持Hive Catalog和Hadoop Catalog

2.1 创建Hive Catalog

set iceberg.catalog.hive_catalog.type=hive;
set iceberg.catalog.hive_catalog.uri=thrift://hive1:9083;
set iceberg.catalog.hive_catalog.clients=5;
set iceberg.catalog.hive_catalog.warehouse=hdfs://cdh01:8020/user/iceberg/hive_catalog;
  • 1
  • 2
  • 3
  • 4

2.2 创建Hadoop Catalog

hive> set iceberg.engine.hive.enabled=true;
hive> set iceberg.catalog.hadoop_catalog.type=hadoop;
hive> set iceberg.catalog.hadoop_catalog.warehouse=hdfs://cdh01:8020/user/iceberg/hadoop_catalog;
  • 1
  • 2
  • 3

3. 数据库的创建

3.1 Hive Catalog下的数据库

对于其它系统将该Hive作为Catalog,创建的数据库,则可以直接使用该数据库,而不用创建。因为Hive和Iceberg的数据库能直接对应上

3.2 Hadoop Catalog下的数据库

因为Hive没有Catalog的概念,所以不能通过上面的方式创建的Catalog自动发现数据库。所以需要创建Hive数据库和Iceberg的数据库对应。例如下面:

hive> create schema iceberg_db location 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db';
  • 1

4. 表的删除和创建

外部表
对于已经通过其它系统创建的Iceberg表,可以通过在Hive中,创建外部表,来读写Iceberg表

4.1 Hive Catalog下的表

对于其它系统将该Hive作为Catalog,创建的数据库表,则可以直接使用该表,而不用创建。因为Hive和Iceberg的表能直接对应上

4.2 Hadoop Catalog下的表

创建Hive的表和Iceberg的表对应上。查询的数据结果和Iceberg中的表结果一样

hive> create external table iceberg_db.t_iceberg_sample_1(
  id bigint, data string
)
stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/t_iceberg_sample_1'
tblproperties('iceberg.catalog'='hadoop_catalog');
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

如果创建表,不指定iceberg.catalog表属性,则默认使用Hive Catalog,元数据储存到当前Hive的元数据位置。不指定存储位置,则表数据储存到当前Hive的warehouse中。

4.3 create table

可以通过Hive直接创建Iceberg表。默认的iceberg.catalog是Hive Catalog

hive> create table iceberg_db.student(
       id bigint,
       name string
     ) partitioned by (birthday date, country string)
     stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';

hive> show create table iceberg_db.student;
show create table iceberg_db.student
OK
CREATE TABLE `iceberg_db.student`(
  `id` bigint COMMENT 'from deserializer', 
  `name` string COMMENT 'from deserializer', 
  `birthday` date COMMENT 'from deserializer', 
  `country` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.iceberg.mr.hive.HiveIcebergSerDe' 
STORED BY 
  'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION
  'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/student'
TBLPROPERTIES (
  'engine.hive.enabled'='true', 
  'external.table.purge'='TRUE', 
'metadata_location'='hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/student/metadata/00000-2f2f2315-7b25-49a1-a89f-6dc268e3ae26.metadata.json', 
  'table_type'='ICEBERG', 
  'transient_lastDdlTime'='1657524217', 
  'uuid'='25b91543-6023-42db-b0e7-b6e4ac88ac53')
Time taken: 0.419 seconds, Fetched: 19 row(s)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

查看HDFS路径如下。也会有Iceberg表的metadata元数据
在这里插入图片描述
删除表,再去hdfs目录下查看,就不存在student表目录了

hive> drop table if exists iceberg_db.student;
drop table iceberg_db.student
OK
Time taken: 0.279 seconds
  • 1
  • 2
  • 3
  • 4

创建表,指定iceberg.catalog

hive> create table iceberg_db.employee(
          id bigint,
          name string
 ) partitioned by (birthday date, country string)
 stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
 location 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee'
 tblproperties('iceberg.catalog'='hadoop_catalog');
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

虽然Iceberg的表是分区表,但是查看Hive表结构是看不到分区信息的。且目前不支持计算列作为分区列

hive> show create table iceberg_db.employee;
OK
CREATE TABLE `iceberg_db.employee`(
  `id` bigint COMMENT 'from deserializer', 
  `name` string COMMENT 'from deserializer', 
  `birthday` date COMMENT 'from deserializer', 
  `country` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.iceberg.mr.hive.HiveIcebergSerDe' 
STORED BY 
  'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 

LOCATION
  'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee'
TBLPROPERTIES (
  'engine.hive.enabled'='true', 
  'external.table.purge'='TRUE', 
  'iceberg.catalog'='hadoop_catalog', 
'metadata_location'='hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee/metadata/00000-c8bc7b63-1db4-4380-aa4f-435f11b2f2da.metadata.json', 
  'table_type'='ICEBERG', 
  'transient_lastDdlTime'='1657524915', 
  'uuid'='78bf3e9a-ab8d-4487-9674-96cf9af89919')
Time taken: 0.343 seconds, Fetched: 20 row(s)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

插入数据

hive> insert into iceberg_db.student(id, name, birthday, country) 
values(1, 'zhang_san', null, 'china'),
(2, 'zhang_san', null, 'china');
  • 1
  • 2
  • 3

查询数据

select * from iceberg_db.student;
  • 1
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/695414
推荐阅读
相关标签
  

闽ICP备14008679号