当前位置:   article > 正文

Clickhouse select查询(含with、sample、array join、prewhere、having、limit by、union all、执行计划等)

clickhouse select

1. select完整语法

[WITH expr |(subquery)]
SELECT [DISTINCT] expr
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE expr]
[[LEFT] ARRAY JOIN]
[GLOBAL] [ALL|ANY|ASOF] [INNER | CROSS | [LEFT|RIGHT|FULL [OUTER]] ] JOIN
(subquery)|table ON|USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr] [WITH ROLLUP|CUBE|TOTALS]
[HAVING expr]
[ORDER BY expr]
[LIMIT [n[,m]]
[UNION ALL]
[INTO OUTFILE filename]
[FORMAT format]
[LIMIT [offset] n BY columns]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

语法的解析大致是按此顺序进行的

1.1 准备数据

clickhouse1 :) 
clickhouse1 :) create table select_table_test(
:-] id UInt32,
:-] name String,
:-] age UInt8,
:-] city String,
:-] score Float64
:-] ) engine = MergeTree()
:-] order by (id, intHash32(id))
:-] sample by intHash32(id);
clickhouse1 :) 
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(1, 'Zhangsan', 30, 'Beijing', 70);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(2, 'Lisi', 40, 'Shanghai', 80);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(3, 'Wangwu', 50, 'Guangzhou', 90);
clickhouse1 :)
clickhouse1 :) select * from select_table_test;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│  3 │ Wangwu │  50 │ Guangzhou │    90 │
└────┴────────┴─────┴───────────┴───────┘
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

2. with

clickhouse1 :) 
clickhouse1 :) with 40 as fit_age
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :) 
clickhouse1 :) with sum(score) as sum_score
:-] select city, toString(sum_score), toUInt16(sum_score) from select_table_test group by city;
┌─city──────┬─toString(sum_score)─┬─toUInt16(sum_score)─┐
│ Shanghai  │ 80                  │                  80 │
│ Beijing   │ 70                  │                  70 │
│ Guangzhou │ 90                  │                  90 │
└───────────┴─────────────────────┴─────────────────────┘
clickhouse1 :) 
clickhouse1 :) with (
:-] select age from select_table_test where city = 'Beijing'
:-] ) as fit_age 
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 也可在from子句中的select查询使用with

3. from

clickhouse1 :)
clickhouse1 :) select number from numbers(3);
┌─number─┐
│      0 │
│      1 │
│      2 │
└────────┘
clickhouse1 :) 
clickhouse1 :) select 10 from system.one;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :) 
clickhouse1 :) select 10;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • select 10会从虚拟表system.one查询
  • from后的final关键字,表示先强制partition合并,再得到结果

4. sample

  • 在数据相同,采样规则相同,采样结果是一样的
  • create table的sample by int_type必须整数类型
  1. sample 0 ~ 1
clickhouse1 :)
clickhouse1 :) select name, _sample_factor from select_table_test sample 0.4;
┌─name─────┬─_sample_factor─┐
│ Zhangsan │            2.5 │
└──────────┴────────────────┘
┌─name───┬─_sample_factor─┐
│ Wangwu │            2.5 │
└────────┴────────────────┘
clickhouse1 :) 
clickhouse1 :) select count(*) * any(_sample_factor) from select_table_test sample 4 / 10;
┌─multiply(count(), any(_sample_factor))─┐
│                                      5 │
└────────────────────────────────────────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 采样因子等于0时,表示不进行采用,返回所有数据
  1. sample 0 ~ 1 offset 0 ~ 1
clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 0.3 offset 0.1;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│  3 │ Wangwu │  50 │ Guangzhou │    90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 假如共有10条数据,offset为0.8,sample为0.3;则会从数据的8/10(第9条)开始进行采样,采走3(10 * 0.3)条数据,但实际剩余只有2条数据,最终采样结果为2条数据
  1. sample rows
clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 2;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│  3 │ Wangwu │  50 │ Guangzhou │    90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 表示采样后有rows行数据
  • 采样的rows参数必须大于1,当rows参数设置的比index_granularity索引粒度值小的多时,也会返回近似index_granularity条数据

5. array join

  • 表A array join 表A的数组Array或嵌套类型Nested字段,将一行数据打平至多行
  • 在一条select语句中,只能由一个array join(from子句除外)
  • 可以直接array join nested_parent_column as n
  1. 准备数据
clickhouse1 :)
clickhouse1 :) create table array_join_table(
:-] fruit String,
:-] values Array(UInt8),
:-] city String,
:-] values2 Array(UInt8)
:-] ) engine = TinyLog;
clickhouse1 :) 
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('banana', [1,2,3], 'Hainan', [18,19]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('orange', [66,88], 'Hunan', [6,7,8,9]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('apple',[], 'Shangxi', [11,12,13]);
clickhouse1 :) 
clickhouse1 :) select * from array_join_table;
┌─fruit──┬─values──┬─city────┬─values2────┐
│ banana │ [1,2,3] │ Hainan  │ [18,19]    │
│ orange │ [66,88] │ Hunan   │ [6,7,8,9]  │
│ apple  │ []      │ Shangxi │ [11,12,13] │
└────────┴─────────┴─────────┴────────────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  1. inner array join(默认)
clickhouse1 :) 
clickhouse1 :) select fruit, values from array_join_table array join values;
┌─fruit──┬─values─┐
│ banana │      1 │
│ banana │      2 │
│ banana │      3 │
│ orange │     66 │
│ orange │     88 │
└────────┴────────┘
clickhouse1 :) 
clickhouse1 :) select fruit, values, v from array_join_table array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │  1 │
│ banana │ [1,2,3] │  2 │
│ banana │ [1,2,3] │  3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
└────────┴─────────┴────┘
clickhouse1 :)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 排除掉了空数组
  1. left array join
clickhouse1 :) 
clickhouse1 :) select fruit, values, v from array_join_table left array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │  1 │
│ banana │ [1,2,3] │  2 │
│ banana │ [1,2,3] │  3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
│ apple  │ []      │  0 │
└────────┴─────────┴────┘
clickhouse1 :) 
clickhouse1 :) select fruit, values, v, arrayMap(x -> x * 2, values) as map_values, m_v from array_join_table left array join values as v, map_values as m_v;
┌─fruit──┬─values──┬──v─┬─map_values─┬─m_v─┐
│ banana │ [1,2,3] │  1 │ [2,4,6]    │   2 │
│ banana │ [1,2,3] │  2 │ [2,4,6]    │   4 │
│ banana │ [1,2,3] │  3 │ [2,4,6]    │   6 │
│ orange │ [66,88] │ 66 │ [132,176]  │ 132 │
│ orange │ [66,88] │ 88 │ [132,176]  │ 176 │
│ apple  │ []      │  0 │ []         │   0 │
└────────┴─────────┴────┴────────────┴─────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 数组为空的出现在结果中
  • 与多个数组字段进行array join时,在一行数据中多个数组的元素个数要相同
    ,也是将一行数据打平至多行

6. join

6.1 数据准备

join_table_test1

clickhouse1 :) 
clickhouse1 :) create table join_table_test1(
:-] id UInt32,
:-] name String,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(1, 'clickhouse', '2021-08-10 11:00:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'spark', '2021-08-10 12:01:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'flink', '2021-08-10 12:02:00');
clickhouse1 :) 
clickhouse1 :) select * from join_table_test1;
┌─id─┬─name───────┬────────────datetime─┐
│  1 │ clickhouse │ 2021-08-10 11:00:00 │
└────┴────────────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│  2 │ spark │ 2021-08-10 12:01:00 │
└────┴───────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│  2 │ flink │ 2021-08-10 12:02:00 │
└────┴───────┴─────────────────────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

join_table_test2

clickhouse1 :) 
clickhouse1 :) create table join_table_test2(
:-] id UInt32,
:-] rate UInt8,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 100, '2021-08-10 11:58:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 90, '2021-08-10 11:59:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(3, 80, '2021-08-10 13:00:00');
clickhouse1 :) 
clickhouse1 :) select * from join_table_test2;
┌─id─┬─rate─┬────────────datetime─┐
│  2 │  100 │ 2021-08-10 11:58:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│  2 │   90 │ 2021-08-10 11:59:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│  3 │   80 │ 2021-08-10 13:00:00 │
└────┴──────┴─────────────────────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

join_table_test3

clickhouse1 :) 
clickhouse1 :) create table join_table_test3(
:-] id UInt32,
:-] star UInt32
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into join_table_test3(id, star) values(1, 1000);
clickhouse1 :) insert into join_table_test3(id, star) values(2, 900);
clickhouse1 :) 
clickhouse1 :) select * from join_table_test3;
┌─id─┬─star─┐
│  1 │ 1000 │
└────┴──────┘
┌─id─┬─star─┐
│  2 │  900 │
└────┴──────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

6.2 连接精度all配合连接类型(inner、left、right、full)

  • 默认连接精度,可通过join_default_strictness配置参数修改
  • 基准表一行数据与非基准表多行数据匹配,返回非基准表多行数据
clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │    0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│  0 │      │   80 │
└────┴──────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all full join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │    0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│  0 │      │   80 │
└────┴──────┴──────┘
clickhouse1 :)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53

6.3 连接精度any配合连接类型(inner、left、right)

  • 基准表一行数据与非基准表多行数据匹配,返回非基准表第一行数据
  • 注意inner join时,只返回了一行数据
clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │    0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│  0 │      │   80 │
└────┴──────┴──────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

6.4 连接精度asof配合连接类型(inner、left)

  • join key一般都是等于(join key1 = join key2)匹配,asof允许定义一个或多个等于(join key1 = join key2)匹配 + 一个不等于(join key1 >、>=、<、<= join key2)匹配
  • join的步骤
    1. join key等于匹配按all的连接精度关联后
    2. 再用join key不等于匹配取最近的一条;使用using时,最后一个字段为asof column,表示匹配右表asof column与左表asof column最近的值
  • asof column必须是整形、浮点型、日期型这种有序数据类型
clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof inner join join_table_test2 b on a.id = b.id and a.datetime > b.datetime;
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof left join join_table_test2 b on a.id = b.id and a.datetime > b.datetime;
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  1 │ clickhouse │    0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof inner join join_table_test2 b using(id, datetime);
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof left join join_table_test2 b using(id, datetime);
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  1 │ clickhouse │    0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39

6.5 连接类型cross join

clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a cross join join_table_test2 b;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │  100 │
│  1 │ clickhouse │   90 │
│  1 │ clickhouse │   80 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
│  2 │ spark │   80 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
│  2 │ flink │   80 │
└────┴───────┴──────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

6.6 多表join、join的注意事项

  1. 多表join

对于3个表,先将前2个表进行join,再将它们的结果集与第3个表进行join; 其它情形以此类推

  1. 性能问题

    1. join时请遵循左大右小,因为会将右表加载至内存,与左表关联
    2. 可以考虑使用join表引擎或字典表提示join性能
  2. 空值策略

    • 默认是使用数据类型的默认值填充,可以将join_use_nulls参数设置为1,用Null值填充

7. where和prewhere

  • 会根据where条件字段,判定select是否启用了索引
  • prewhere只能用于mergeTree系列,先对条件字段进行过滤,再对符合的行,进行select其它字段
  • 当使用where的时候,clickhouse会根据where的条件字段,判断是否自动启用prewhere;但下列几种不会自动启用prewhere
    • 常量表达式where 1 = 1
    • where a = 3, a字段为primary key或使用了alias默认值
    • select a ...... where a = 3, select的字段与where字段相同
    • ...... from tb array join nest_field where nest_field.col1 = 10, select查询包含了array join、global in、global not in、 indexHint(函数)

8. group by

  1. 准备数据
clickhouse1 :) create table group_table_test(
:-] province String,
:-] city String,
:-] season String,
:-] sales Float64
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '春', 1100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '夏', 1200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '秋', 1300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '冬', 1400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '春', 2100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '夏', 2200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '秋', 2300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '冬', 2400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '春', 3100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '夏', 3200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '秋', 3300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '冬', 3400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '春', 4100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '夏', 4200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '秋', 4300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '冬', 4400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '春', 5100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '夏', 5200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '秋', 5300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '冬', 5400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '春', 6100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '夏', 6200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '秋', 6300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '冬', 6400);
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  1. with rollup
  • group by province, city with rollup: 从右向左进行上钻,形成组合[province, city]、[province]、[ ],再对sum(sales)进行计算
clickhouse1 :) 
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with rollup order by province;
┌─province─┬─city─┬─sum(sales)─┐
│          │      │      90000 │
│ 广东     │      │      63000 │
│ 广东     │ 深圳 │      25000 │
│ 广东     │ 广州 │      17000 │
│ 广东     │ 东莞 │      21000 │
│ 湖南     │      │      27000 │
│ 湖南     │ 株洲 │       9000 │
│ 湖南     │ 湘潭 │      13000 │
│ 湖南     │ 长沙 │       5000 │
└──────────┴──────┴────────────┘
clickhouse1 :)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  1. with cube
  • group by province, city with cube: 对所有维度进行组合,形成组合[province, city]、[province]、[city]、[ ],再对sum(sales)进行计算
clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with cube order by province, city;
┌─province─┬─city─┬─sum(sales)─┐
│          │      │      90000 │
│          │ 东莞 │      21000 │
│          │ 广州 │      17000 │
│          │ 株洲 │       9000 │
│          │ 深圳 │      25000 │
│          │ 湘潭 │      13000 │
│          │ 长沙 │       5000 │
│ 广东     │      │      63000 │
│ 广东     │ 东莞 │      21000 │
│ 广东     │ 广州 │      17000 │
│ 广东     │ 深圳 │      25000 │
│ 湖南     │      │      27000 │
│ 湖南     │ 株洲 │       9000 │
│ 湖南     │ 湘潭 │      13000 │
│ 湖南     │ 长沙 │       5000 │
└──────────┴──────┴────────────┘
clickhouse1 :)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  1. with totals
  • 对所有的数据进行sum(sales)计算
clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with totals order by province, city;
┌─province─┬─city─┬─sum(sales)─┐
│ 广东     │ 东莞 │      21000 │
│ 广东     │ 广州 │      17000 │
│ 广东     │ 深圳 │      25000 │
│ 湖南     │ 株洲 │       9000 │
│ 湖南     │ 湘潭 │      13000 │
│ 湖南     │ 长沙 │       5000 │
└──────────┴──────┴────────────┘

Totals:
┌─province─┬─city─┬─sum(sales)─┐
│          │      │      90000 │
└──────────┴──────┴────────────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

9. having

对group by聚合之后的结果,再次进行过滤

clickhouse1 :) 
clickhouse1 :) select province, city, sum(sales) total_sales from group_table_test group by province, city having total_sales > 10000 order by province, city;
┌─province─┬─city─┬─total_sales─┐
│ 广东     │ 东莞 │       21000 │
│ 广东     │ 广州 │       17000 │
│ 广东     │ 深圳 │       25000 │
│ 湖南     │ 湘潭 │       13000 │
└──────────┴──────┴─────────────┘
clickhouse1 :)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

10. order by

  • 不同partition的数据不能进行排序
  1. nulls first
clickhouse1 :)
clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v
:-] select v order by v nulls first;
┌────v─┐
│ ᴺᵁᴸᴸ │
│  nan │
│   30 │
│   50 │
└──────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  1. nulls last
  • 默认操作
clickhouse1 :) 
clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v
:-] select v order by v nulls last;
┌────v─┐
│   30 │
│   50 │
│  nan │
│ ᴺᵁᴸᴸ │
└──────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

11. limit by

clickhouse1 :)
clickhouse1 :) select province, city, season, sales from group_table_test limit 1 offset 1 by province, city;
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南     │ 长沙 │ 夏     │  1200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南     │ 株洲 │ 夏     │  2200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南     │ 湘潭 │ 夏     │  3200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东     │ 广州 │ 夏     │  4200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东     │ 东莞 │ 夏     │  5200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东     │ 深圳 │ 夏     │  6200 │
└──────────┴──────┴────────┴───────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 执行于order by之后和limit之前
  • 按province、city进行分组,在每组中,跳过1条数据,取1条数据
  • 简写形式:limit 1, 1 by province, city

12. limit

clickhouse1 :)
clickhouse1 :) select * from select_table_test limit 1 offset 1;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)
clickhouse1 :) select * from select_table_test limit 1, 1;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 如结果数据位于多个partition, 且未使用order by, 每次limit返回的数据可能不同

13. select

正则匹配查询字段

clickhouse1 :)
clickhouse1 :) select columns('^i'), columns('a') from select_table_test;
┌─id─┬─name─────┬─age─┐
│  1 │ Zhangsan │  30 │
│  2 │ Lisi     │  40 │
│  3 │ Wangwu   │  50 │
└────┴──────────┴─────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 返回以i开头的字段,和包含a的字段

14. union all

clickhouse1 :) 
clickhouse1 :) select * from select_table_test where id = 1
:-] union all
:-] select * from select_table_test where id = 2;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • union all后的字段名称以左表的为准

15. 执行计划

clickhouse1 :) 
clickhouse1 :) explain
:-] with 30 as fit_age
:-] select age, count() num from select_table_test
:-] where age = fit_age
:-] group by age 
:-] having num = 1 
:-] order by num 
:-] limit 1;
┌─explain─────────────────────────────────────────────────────────────────────────────────────┐
│ Expression (Projection)                                                                     │
│   Limit (preliminary LIMIT)                                                                 │
│     MergingSorted (Merge sorted streams for ORDER BY)                                       │
│       MergeSorting (Merge sorted blocks for ORDER BY)                                       │
│         PartialSorting (Sort each block for ORDER BY)                                       │
│           Expression (Before ORDER BY)                                                      │
│             Filter (HAVING)                                                                 │
│               Aggregating                                                                   │
│                 Expression (Before GROUP BY)                                                │
│                   Filter (WHERE)                                                            │
│                     SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│                       ReadFromMergeTree                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
clickhouse1 :) 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

也可以通过查看执行日志来查看具体的信息:

[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# clickhouse-client -u default --password default123 --send_logs_level=trace <<< 'select * from select_table_test' > /dev/null
[clickhouse1] 2021.08.09 18:20:59.417657 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> executeQuery: (from [::1]:39664, using production parser) select * from select_table_test 
[clickhouse1] 2021.08.09 18:20:59.418669 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Trace> ContextAccess (default): Access granted: SELECT(id, name, age, city, score) ON default.select_table_test
[clickhouse1] 2021.08.09 18:20:59.418786 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> default.select_table_test (0d5feec0-6dbe-4e23-8d5f-eec06dbe4e23) (SelectExecutor): Key condition: unknown
[clickhouse1] 2021.08.09 18:20:59.418808 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> default.select_table_test (0d5feec0-6dbe-4e23-8d5f-eec06dbe4e23) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 1/1 marks by primary key, 1 marks to read from 1 ranges
[clickhouse1] 2021.08.09 18:20:59.418867 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[clickhouse1] 2021.08.09 18:20:59.418954 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> MergeTreeSelectProcessor: Reading 1 ranges from part all_1_3_1, approx. 3 rows starting from 0
[clickhouse1] 2021.08.09 18:20:59.420643 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Information> executeQuery: Read 3 rows, 135.00 B in 0.0028947 sec., 1036 rows/sec., 45.54 KiB/sec.
[clickhouse1] 2021.08.09 18:20:59.420674 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
[root@clickhouse1 ~]# 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/273284
推荐阅读
相关标签
  

闽ICP备14008679号