赞
踩
Iceberg 整合 Hive
把iceberg连接hive的包拷贝到hive的lib目录及auxlib目录
cp /home/bonc/iceberg-hive-runtime-0.13.2.jar /opt/cloudera/parcels/CDH/lib/hive/auxlib
cp /home/bonc/iceberg-hive-runtime-0.13.2.jar /opt/cloudera/parcels/CDH/lib/hive/lib/
hive> add jar /home/bonc/iceberg-hive-runtime-0.13.2.jar;
hive> set iceberg.engine.hive.enabled=true;
也可以配置到hive-site.xml
<property>
<name>iceberg.engine.hive.enabled</name>
<value>true</value>
<description>Hive是否开启Iceberg的支持</description>
</property>
Hive本身没有Catalog的概念,但是Iceberg有Catalog。所以Hive将Catalog的信息用键值对的属性来实现,这样建表的时候就可以直接使用创建的Catalog
Hive集成Iceberg支持Hive Catalog和Hadoop Catalog
set iceberg.catalog.hive_catalog.type=hive;
set iceberg.catalog.hive_catalog.uri=thrift://hive1:9083;
set iceberg.catalog.hive_catalog.clients=5;
set iceberg.catalog.hive_catalog.warehouse=hdfs://cdh01:8020/user/iceberg/hive_catalog;
hive> set iceberg.engine.hive.enabled=true;
hive> set iceberg.catalog.hadoop_catalog.type=hadoop;
hive> set iceberg.catalog.hadoop_catalog.warehouse=hdfs://cdh01:8020/user/iceberg/hadoop_catalog;
对于其它系统将该Hive作为Catalog,创建的数据库,则可以直接使用该数据库,而不用创建。因为Hive和Iceberg的数据库能直接对应上
因为Hive没有Catalog的概念,所以不能通过上面的方式创建的Catalog自动发现数据库。所以需要创建Hive数据库和Iceberg的数据库对应。例如下面:
hive> create schema iceberg_db location 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db';
外部表
对于已经通过其它系统创建的Iceberg表,可以通过在Hive中,创建外部表,来读写Iceberg表
对于其它系统将该Hive作为Catalog,创建的数据库表,则可以直接使用该表,而不用创建。因为Hive和Iceberg的表能直接对应上
创建Hive的表和Iceberg的表对应上。查询的数据结果和Iceberg中的表结果一样
hive> create external table iceberg_db.t_iceberg_sample_1(
id bigint, data string
)
stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/t_iceberg_sample_1'
tblproperties('iceberg.catalog'='hadoop_catalog');
如果创建表,不指定iceberg.catalog表属性,则默认使用Hive Catalog,元数据储存到当前Hive的元数据位置。不指定存储位置,则表数据储存到当前Hive的warehouse中。
可以通过Hive直接创建Iceberg表。默认的iceberg.catalog是Hive Catalog
hive> create table iceberg_db.student( id bigint, name string ) partitioned by (birthday date, country string) stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'; hive> show create table iceberg_db.student; show create table iceberg_db.student OK CREATE TABLE `iceberg_db.student`( `id` bigint COMMENT 'from deserializer', `name` string COMMENT 'from deserializer', `birthday` date COMMENT 'from deserializer', `country` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.iceberg.mr.hive.HiveIcebergSerDe' STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/student' TBLPROPERTIES ( 'engine.hive.enabled'='true', 'external.table.purge'='TRUE', 'metadata_location'='hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/student/metadata/00000-2f2f2315-7b25-49a1-a89f-6dc268e3ae26.metadata.json', 'table_type'='ICEBERG', 'transient_lastDdlTime'='1657524217', 'uuid'='25b91543-6023-42db-b0e7-b6e4ac88ac53') Time taken: 0.419 seconds, Fetched: 19 row(s)
查看HDFS路径如下。也会有Iceberg表的metadata元数据
删除表,再去hdfs目录下查看,就不存在student表目录了
hive> drop table if exists iceberg_db.student;
drop table iceberg_db.student
OK
Time taken: 0.279 seconds
创建表,指定iceberg.catalog
hive> create table iceberg_db.employee(
id bigint,
name string
) partitioned by (birthday date, country string)
stored by 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
location 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee'
tblproperties('iceberg.catalog'='hadoop_catalog');
虽然Iceberg的表是分区表,但是查看Hive表结构是看不到分区信息的。且目前不支持计算列作为分区列
hive> show create table iceberg_db.employee; OK CREATE TABLE `iceberg_db.employee`( `id` bigint COMMENT 'from deserializer', `name` string COMMENT 'from deserializer', `birthday` date COMMENT 'from deserializer', `country` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.iceberg.mr.hive.HiveIcebergSerDe' STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee' TBLPROPERTIES ( 'engine.hive.enabled'='true', 'external.table.purge'='TRUE', 'iceberg.catalog'='hadoop_catalog', 'metadata_location'='hdfs://cdh01:8020/user/iceberg/hadoop_catalog/iceberg_db/employee/metadata/00000-c8bc7b63-1db4-4380-aa4f-435f11b2f2da.metadata.json', 'table_type'='ICEBERG', 'transient_lastDdlTime'='1657524915', 'uuid'='78bf3e9a-ab8d-4487-9674-96cf9af89919') Time taken: 0.343 seconds, Fetched: 20 row(s)
插入数据
hive> insert into iceberg_db.student(id, name, birthday, country)
values(1, 'zhang_san', null, 'china'),
(2, 'zhang_san', null, 'china');
查询数据
select * from iceberg_db.student;
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。