赞
踩
注:大家觉得博客好的话,别忘了点赞收藏呀,本人每周都会更新关于人工智能和大数据相关的内容,内容多为原创,Python Java Scala SQL 代码,CV NLP 推荐系统等,Spark Flink Kafka Hbase Hive Flume等等~写的都是纯干货,各种顶会的论文解读,一起进步。
今天和大家分享一下Doris系列之建表操作
#博学谷IT学习技术支持#
CREATE TABLE table1
(
siteid INT DEFAULT '10',
citycode SMALLINT,
username VARCHAR(32) DEFAULT '',
pv BIGINT SUM DEFAULT '0'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
insert into table1 values(1,1,'user1',10);
insert into table1 values(1,1,'user1',10);
insert into table1 values(1,2,'user1',10);
insert into table1 values(1,2,'user1',10);
CREATE TABLE table1
(
siteid INT DEFAULT '10',
citycode SMALLINT,
username VARCHAR(32) DEFAULT '',
pv BIGINT REPLACE DEFAULT '0'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
insert into table1 values(1,1,'user1',10);
insert into table1 values(1,1,'user1',20);
insert into table1 values(1,2,'user1',10);
insert into table1 values(1,2,'user1',30);
以下场景推荐使用复合分区
CREATE TABLE table2 ( event_day DATE, siteid INT DEFAULT '10', citycode SMALLINT, username VARCHAR(32) DEFAULT '', pv BIGINT SUM DEFAULT '0' ) AGGREGATE KEY(event_day, siteid, citycode, username) PARTITION BY RANGE(event_day) ( PARTITION p202106 VALUES LESS THAN ('2021-07-01'), PARTITION p202107 VALUES LESS THAN ('2021-08-01'), PARTITION p202108 VALUES LESS THAN ('2021-09-01') ) DISTRIBUTED BY HASH(siteid) BUCKETS 10 PROPERTIES("replication_num" = "3");
导入数据:
2021-06-03|9|1|jack|3
2021-06-10|10|2|rose|2
2021-07-03|11|1|jim|2
2021-07-05|12|1|grace|2
2021-07-12|13|2|tom|2
2021-08-15|14|3|bush|3
2021-08-12|15|3|helen|3
curl --location-trusted -u root:123456 -H “label:table2_20210707” -H “column_separator:|” -T table2_data http://node01:8030/api/test_db/table2/_stream_load
Broker load是一个导入的异步方式,不同的数据源需要部署不同的 broker 进程。可以通过 show broker 命令查看已经部署的 broker。
# 创建表 CREATE TABLE test_db.user_result( id BIGINT, name VARCHAR(50), age INT, gender INT, province VARCHAR(50), city VARCHAR(50), region VARCHAR(50), phone VARCHAR(50), birthday VARCHAR(50), hobby VARCHAR(50), register_date VARCHAR(50) ) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 10; # 通过HDFS导入数据 LOAD LABEL test_db.user_result ( DATA INFILE("hdfs://node01:8020/datas/user.csv") INTO TABLE `user_result` COLUMNS TERMINATED BY "," FORMAT AS "csv" (id, name, age, gender, province,city,region,phone,birthday,hobby,register_date) ) WITH BROKER broker_10_20_30 ( "dfs.nameservices" = "my_cluster", "dfs.ha.namenodes.my_cluster" = "nn1,nn2,nn3", "dfs.namenode.rpc-address.my_cluster.nn1" = "node01:8020", "dfs.namenode.rpc-address.my_cluster.nn2" = "node02:8020", "dfs.namenode.rpc-address.my_cluster.nn3" = "node03:8020", "dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" ) PROPERTIES ( "max_filter_ratio"="0.00002" );
例如:以上就是今天要讲的内容,本文仅仅简单介绍了Doris系列之建表操作,以后还会继续更新Doris的其他用法。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。