当前位置:   article > 正文

一天学会PostgreSQL应用开发与管理 - 4 高级SQL用法

postgresql高级统计sql教程

背景

本章大纲

1. 聚集函数

常用聚合函数
统计类的聚合函数
分组排序聚合
Hypothetical-Set聚合函数
多维分析

2. 子查询

3. union\union all\except\intersect

4. 自连接

5. 内连接

优化器如何强制连接顺序?

6. 外连接

左外连接
右外连接
全外连接

7. 窗口查询

第二章:高级SQL用法

1. 聚集函数

https://www.postgresql.org/docs/9.6/static/functions-aggregate.html

常用聚合函数
FunctionArgument Type(s)Return TypeDescription
array_agg(expression)anyarray of the argument typeinput values, including nulls, concatenated into an array
avg(expression)smallint, int, bigint, real, double precision, numeric, or intervalnumeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data typethe average (arithmetic mean) of all input values
bit_and(expression)smallint, int, bigint, or bitsame as argument data typethe bitwise AND of all non-null input values, or null if none
bit_or(expression)smallint, int, bigint, or bitsame as argument data typethe bitwise OR of all non-null input values, or null if none
bool_and(expression)boolbooltrue if all input values are true, otherwise false
bool_or(expression)boolbooltrue if at least one input value is true, otherwise false
count(*)-bigintnumber of input rows
count(expression)anybigintnumber of input rows for which the value of expression is not null
every(expression)boolboolequivalent to bool_and
json_agg(expression)anyjsonaggregates values as a JSON array
json_object_agg(name,value)(any, any)jsonaggregates name/value pairs as a JSON object
max(expression)any array, numeric, string, or date/time typesame as argument typemaximum value of expression across all input values
min(expression)any array, numeric, string, or date/time typesame as argument typeminimum value of expression across all input values
string_agg(expression,delimiter)(text, text) or (bytea, bytea)same as argument typesinput values concatenated into a string, separated by delimiter
sum(expression)smallint, int, bigint, real, double precision, numeric, interval, or moneybigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data typesum of expression across all input values
xmlagg(expression)xmlxmlconcatenation of XML values (see alsoSection 9.14.1.7)

上图中所有聚合函数, 当没有行输入时, 除了count返回0, 其他都返回null.

使用sum, array_agg时, 当没有行输入, 返回NULL可能有点别扭, 可以使用coalesce来替代NULL, 如coalesce(sum(x), 0)

coalesce(array_agg(x), '{}'::int[])

例子 :

聚合后得到数组, null将计入数组元素

  1. postgres=# select array_agg(id) from (values(null),(1),(2)) as t(id);
  2. array_agg
  3. ------------
  4. {NULL,1,2}
  5. (1 row)

算平均值时不计算null

  1. postgres=# select avg(id) from (values(null),(1),(2)) as t(id);
  2. avg
  3. --------------------
  4. 1.5000000000000000
  5. (1 row)

算bit与|或 时也不计算NULL

  1. postgres=# select bit_and(id) from (values(null),(1),(2)) as t(id);
  2. bit_and
  3. ---------
  4. 0
  5. (1 row)
  6. postgres=# select bit_or(id) from (values(null),(1),(2)) as t(id);
  7. bit_or
  8. --------
  9. 3
  10. (1 row)

算布尔逻辑时也不计算NULL

  1. postgres=# select bool_and(id) from (values(null),(true),(false)) as t(id);
  2. bool_and
  3. ----------
  4. f
  5. (1 row)

every是bool_and的别名, 实际上是SQL标准中定义的.

  1. postgres=# select every(id) from (values(null),(true),(false)) as t(id);
  2. every
  3. -------
  4. f
  5. (1 row)

SQL标准中还定义了any和some为bool_or的别名, 但是因为any和some还可以被解释为子查询, 所以在PostgreSQL中any和some的布尔逻辑聚合不可用.

  1. postgres=# select any(id) from (values(null),(true),(false)) as t(id);
  2. ERROR: syntax error at or near "any"
  3. LINE 1: select any(id) from (values(null),(true),(false)) as t(id);
  4. ^
  5. postgres=# select some(id) from (values(null),(true),(false)) as t(id);
  6. ERROR: syntax error at or near "some"
  7. LINE 1: select some(id) from (values(null),(true),(false)) as t(id);
  8. ^

bool_or的例子

  1. postgres=# select bool_or(id) from (values(null),(true),(false)) as t(id);
  2. bool_or
  3. ---------
  4. t
  5. (1 row)

计算非空的表达式个数, count带表达式时, 不计算null

  1. postgres=# select count(id) from (values(null),(1),(2)) as t(id);
  2. count
  3. -------
  4. 2
  5. (1 row)

计算表达式(含空值)的个数, count()计算null, 注意count()是一个独立的聚合函数. 请和count(express)区分开来.

  1. postgres=# select count(*) from (values(null),(1),(2)) as t(id);
  2. count
  3. -------
  4. 3
  5. (1 row)
  6. postgres=# select count(*) from (values(null),(null),(1),(2)) as t(id);
  7. count
  8. -------
  9. 4
  10. (1 row)

聚合后得到json, 不带key的json聚合

  1. postgres=# select json_agg(id) from (values(null),(true),(false)) as t(id);
  2. json_agg
  3. ---------------------
  4. [null, true, false]
  5. (1 row)

聚合后得到json, 带key的json聚合, 注意key不能为null, 否则报错.

  1. postgres=# select json_object_agg(c1,c2) from (values('a',null),('b',true),('c',false)) as t(c1,c2);
  2. json_object_agg
  3. -----------------------------------------
  4. { "a" : null, "b" : true, "c" : false }
  5. (1 row)
  6. postgres=# select json_object_agg(c1,c2) from (values(null,null),('b',true),('c',false)) as t(c1,c2);
  7. ERROR: 22023: field name must not be null
  8. LOCATION: json_object_agg_transfn, json.c:1959

计算最大最小值, max, min都不计算null

  1. postgres=# select max(id) from (values(null),(1),(2)) as t(id);
  2. max
  3. -----
  4. 2
  5. (1 row)
  6. postgres=# select min(id) from (values(null),(1),(2)) as t(id);
  7. min
  8. -----
  9. 1
  10. (1 row)

聚合后得到字符串, 字符串聚合

  1. postgres=# select string_agg(c1,'***') from (values('a',null),('b',true),('c',false)) as t(c1,c2);
  2. string_agg
  3. ------------
  4. a***b***c
  5. (1 row)
  6. postgres=# select string_agg(id,'***') from (values(null),('digoal'),('zhou')) as t(id);
  7. string_agg
  8. ---------------
  9. digoal***zhou
  10. (1 row)

计算总和, sum不计算null, 当所有行都是null时, 即没有任何行输入, 返回null.

  1. postgres=# select sum(id) from (values(null),(1),(2)) as t(id);
  2. sum
  3. -----
  4. 3
  5. (1 row)
  6. postgres=# select sum(id::int) from (values(null),(null),(null)) as t(id);
  7. sum
  8. -----
  9. (1 row)
  10. (1 row)

聚合后得到xml

  1. postgres=# select xmlagg(id::xml) from (values(null),('<foo>digoal</foo>'),('<bar/>')) as t(id);
  2. xmlagg
  3. -------------------------
  4. <foo>digoal</foo><bar/>
  5. (1 row)

某些聚合函数得到的结果可能和行的输入顺序有关, 例如array_agg, json_agg, json_object_agg, string_agg, and xmlagg, 以及某些自定义聚合函数. 如何来实现呢?

支持聚合函数中使用order by的PostgreSQL版本可以用如下语法 :

  1. postgres=# select string_agg(id,'***' order by id) from (values(null),('digoal'),('zhou')) as t(id);
  2. string_agg
  3. ---------------
  4. digoal***zhou
  5. (1 row)
  6. postgres=# select string_agg(id,'***' order by id desc) from (values(null),('digoal'),('zhou')) as t(id);
  7. string_agg
  8. ---------------
  9. zhou***digoal
  10. (1 row)
统计类的聚合函数
FunctionArgument TypeReturn TypeDescription
corr(Y, X)double precisiondouble precisioncorrelation coefficient
covar_pop(Y, X)double precisiondouble precisionpopulation covariance
covar_samp(Y, X)double precisiondouble precisionsample covariance
regr_avgx(Y, X)double precisiondouble precisionaverage of the independent variable (sum(X)/N)
regr_avgy(Y, X)double precisiondouble precisionaverage of the dependent variable (sum(Y)/N)
regr_count(Y, X)double precisionbigintnumber of input rows in which both expressions are nonnull
regr_intercept(Y, X)double precisiondouble precisiony-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_r2(Y, X)double precisiondouble precisionsquare of the correlation coefficient
regr_slope(Y, X)double precisiondouble precisionslope of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_sxx(Y, X)double precisiondouble precisionsum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable)
regr_sxy(Y, X)double precisiondouble precisionsum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable)
regr_syy(Y, X)double precisiondouble precisionsum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable)
stddev(expression)smallint, int, bigint, real, double precision, or numericdouble precision for floating-point arguments, otherwise numerichistorical alias for stddev_samp
stddev_pop(expression)smallint, int, bigint, real, double precision, or numericdouble precision for floating-point arguments, otherwise numericpopulation standard deviation of the input values
stddev_samp(expression)smallint, int, bigint, real, double precision, or numericdouble precision for floating-point arguments, otherwise numericsample standard deviation of the input values
variance(expression)smallint, int, bigint, real, double precision, or numericdouble precision for floating-point arguments, otherwise numerichistorical alias for var_samp
var_pop(expression)smallint, int, bigint, real, double precision, or numericdouble precision for floating-point arguments, otherwise numericpopulation variance of the input values (square of the population standard deviation)
var_samp(expression)smallint, int, bigint, real, double precision, or numericdouble precision for floating-point arguments, otherwise numericsample variance of the input values (square of the sample standard deviation)

相关性统计 :

corr, regr_r2

总体|样本 方差, 标准方差 :

variance, var_pop, var_samp

stddev, stddev_pop, stddev_samp

总体协方差, 样本协方差 :

covar_pop, covar_samp

线性回归 :

regr_avgx, regr_avgy, regr_count, regr_intercept(截距), regr_r2(相关度corr的平方), regr_slope(斜率), regr_sxx, regr_sxy, regr_syy.

分组排序聚合
FunctionDirect Argument Type(s)Aggregated Argument Type(s)Return TypeDescription
mode() WITHIN GROUP (ORDER BYsort_expression)-any sortable typesame as sort expressionreturns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression)double precisiondouble precisionor intervalsame as sort expressioncontinuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed
percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression)double precision[]double precisionor intervalarray of sort expression's typemultiple continuous percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the value corresponding to that percentile
percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression)double precisionany sortable typesame as sort expressiondiscrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction
percentile_disc(fractions) WITHIN GROUP (ORDER BY sort_expression)double precision[]any sortable typearray of sort expression's typemultiple discrete percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the input value corresponding to that percentile

mode比较好理解, 就是取分组中出现频率最高的值或表达式, 如果最高频率的值有多个, 则随机取一个.

  1. postgres=# create table test(id int, info text);
  2. CREATE TABLE
  3. postgres=# insert into test values (1,'test1');
  4. INSERT 0 1
  5. postgres=# insert into test values (1,'test1');
  6. INSERT 0 1
  7. postgres=# insert into test values (1,'test2');
  8. INSERT 0 1
  9. postgres=# insert into test values (1,'test3');
  10. INSERT 0 1
  11. postgres=# insert into test values (2,'test1');
  12. INSERT 0 1
  13. postgres=# insert into test values (2,'test1');
  14. INSERT 0 1
  15. postgres=# insert into test values (2,'test1');
  16. INSERT 0 1
  17. postgres=# insert into test values (3,'test4');
  18. INSERT 0 1
  19. postgres=# insert into test values (3,'test4');
  20. INSERT 0 1
  21. postgres=# insert into test values (3,'test4');
  22. INSERT 0 1
  23. postgres=# insert into test values (3,'test4');
  24. INSERT 0 1
  25. postgres=# insert into test values (3,'test4');
  26. INSERT 0 1
  27. postgres=# select * from test;
  28. id | info
  29. ----+-------
  30. 1 | test1
  31. 1 | test1
  32. 1 | test2
  33. 1 | test3
  34. 2 | test1
  35. 2 | test1
  36. 2 | test1
  37. 3 | test4
  38. 3 | test4
  39. 3 | test4
  40. 3 | test4
  41. 3 | test4
  42. (12 rows)

取出所有数据中, 出现频率最高的info, 有可能是test1也有可能是test4, 因为他们的出现频率一致.

mode的返回结果数据类型和order by后面的表达式一致.

  1. postgres=# select mode() within group (order by info) from test;
  2. mode
  3. -------
  4. test1
  5. (1 row)

如果按INFO来分组的话, 取出出现频率最高的info, 实际上这个操作是没有任何意义的, 返回值就是所有记录的info的唯一值.

  1. postgres=# select mode() within group (order by info) from test group by info;
  2. mode
  3. -------
  4. test1
  5. test2
  6. test3
  7. test4
  8. (4 rows)

按id来分组, 取出组内出现频率最高的info值, 这个是有意义的.

  1. postgres=# select mode() within group (order by info) from test group by id;
  2. mode
  3. -------
  4. test1
  5. test1
  6. test4
  7. (3 rows)

id=1 , 出现频率最高的info是test1. 出现2次.

如下 :

  1. postgres=# select id,info,count(*) from test group by id,info;
  2. id | info | count
  3. ----+-------+-------
  4. 1 | test1 | 2
  5. 1 | test3 | 1
  6. 3 | test4 | 5
  7. 1 | test2 | 1
  8. 2 | test1 | 3
  9. (5 rows)

如果要返回mode()并返回频率次数. 可以使用row_number()窗口来实现. 如下.

  1. postgres=# select id,info,cnt from (select id,info,cnt,row_number() over(partition by id order by cnt desc) as rn from (select id,info,count(*) cnt from test group by id,info) t) t where t.rn=1;
  2. id | info | cnt
  3. ----+-------+-----
  4. 1 | test1 | 2
  5. 2 | test1 | 3
  6. 3 | test4 | 5
  7. (3 rows)

其他, mode的返回结果数据类型和order by后面的表达式一致.

  1. postgres=# select mode() within group (order by id) from test;
  2. mode
  3. ------
  4. 3
  5. (1 row)
  6. postgres=# select mode() within group (order by id+1) from test;
  7. mode
  8. ------
  9. 4
  10. (1 row)

另外还有4个函数是和数据分布有关的, 需要指定从0到1的分布位置. 返回排序后, 在指定分布位置的值或表达式的值.

  1. src/backend/utils/adt/orderedsetaggs.c
  2. if (percentile < 0 || percentile > 1 || isnan(percentile))
  3. ereport(ERROR,
  4. (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
  5. errmsg("percentile value %g is not between 0 and 1",
  6. percentile)));

同时还需要注意区分连续分布和离散分布.

  1. postgres=# create table test(id int, info text);
  2. CREATE TABLE
  3. postgres=# insert into test values (1,'test1');
  4. INSERT 0 1
  5. postgres=# insert into test values (2,'test2');
  6. INSERT 0 1
  7. postgres=# insert into test values (3,'test2');
  8. INSERT 0 1
  9. postgres=# insert into test values (4,'test2');
  10. INSERT 0 1
  11. postgres=# insert into test values (5,'test2');
  12. INSERT 0 1
  13. postgres=# insert into test values (6,'test2');
  14. INSERT 0 1
  15. postgres=# insert into test values (7,'test2');
  16. INSERT 0 1
  17. postgres=# insert into test values (8,'test3');
  18. INSERT 0 1
  19. postgres=# insert into test values (100,'test3');
  20. INSERT 0 1
  21. postgres=# insert into test values (1000,'test4');
  22. INSERT 0 1
  23. postgres=# select * from test;
  24. id | info
  25. ------+-------
  26. 1 | test1
  27. 2 | test2
  28. 3 | test2
  29. 4 | test2
  30. 5 | test2
  31. 6 | test2
  32. 7 | test2
  33. 8 | test3
  34. 100 | test3
  35. 1000 | test4
  36. (10 rows)

取连续分布的中位数可以用percentile_cont(0.5)来获得.

  1. postgres=# select percentile_cont(0.5) within group (order by id) from test;
  2. percentile_cont
  3. -----------------
  4. 5.5
  5. (1 row)

这个5.5是怎么计算来的呢? 参考本文末尾 :

  1. If (CRN = FRN = RN) then the result is
  2. (value of expression from row at RN)
  3. Otherwise the result is
  4. (CRN - RN) * (value of expression for row at FRN) +
  5. (RN - FRN) * (value of expression for row at CRN)

解释 :

N = 当前分组的行数 = 10

RN = (1+传入参数*(N-1)) = (1+0.5*(10-1)) = 5.5

CRN = ceiling(RN) = 6

FRN = floor(RN) = 5

value of expression for row at FRN : 当前分组内第FRN行的值 = 5

value of expression for row at CRN : 当前分组内第CRN行的值 = 6

所以最终中位数值 :

  1. (CRN - RN) * (value of expression for row at FRN) +
  2. (RN - FRN) * (value of expression for row at CRN) =
  3. (6-5.5)*(5) + (5.5 - 5)*(6) = 5.5;

使用info分组 :

  1. postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;
  2. percentile_cont | info
  3. -----------------+-------
  4. 1 | test1
  5. 4.5 | test2
  6. 54 | test3
  7. 1000 | test4
  8. (4 rows)

验证这个值 4.5 | test2 :

  1. 2 | test2
  2. 3 | test2
  3. 4 | test2
  4. 5 | test2
  5. 6 | test2
  6. 7 | test2

N = 当前分组的行数 = 6

RN = (1+传入参数*(N-1)) = (1+0.5*(6-1)) = 3.5

CRN = ceiling(RN) = 4

FRN = floor(RN) = 3

value of expression for row at FRN : 当前分组内第FRN行的值 = 4

value of expression for row at CRN : 当前分组内第CRN行的值 = 5

所以最终中位数值 :

  1. (CRN - RN) * (value of expression for row at FRN) +
  2. (RN - FRN) * (value of expression for row at CRN) =
  3. (4-3.5)*(4) + (3.5 - 3)*(5) = 4.5;

当输入参数为数组时, 返回值也是数组, 如下 :

  1. postgres=# select percentile_cont(array[0.5, 1]) within group (order by id) from test;
  2. percentile_cont
  3. -----------------
  4. {5.5,1000}
  5. (1 row)

接下来看一下稀疏分布 :

返回行号大于等于指定百分比的值或表达式值.

例如 :

  1. postgres=# select id from test;
  2. id
  3. ------
  4. 1
  5. 2
  6. 3
  7. 4
  8. 5
  9. 6
  10. 7
  11. 8
  12. 100
  13. 1000
  14. (10 rows)

当前组一共10行, 取位置在0.5的.即行号>=0.5*10的第一行的值或表达式的值.

  1. postgres=# select percentile_disc(0.5) within group (order by id) from test;
  2. percentile_disc
  3. -----------------
  4. 5
  5. (1 row)
  6. postgres=# select percentile_disc(0.5) within group (order by id^2) from test;
  7. percentile_disc
  8. -----------------
  9. 25
  10. (1 row)

输入0.11, 表示行号返回>=1.1的第一行的值.

  1. postgres=# select percentile_disc(0.11) within group (order by id) from test;
  2. percentile_disc
  3. -----------------
  4. 2
  5. (1 row)

再看个例子

  1. postgres=# select id,info,count(*) over (partition by info) from test;
  2. id | info | count
  3. ------+-------+-------
  4. 1 | test1 | 1
  5. 2 | test2 | 6
  6. 3 | test2 | 6
  7. 4 | test2 | 6
  8. 5 | test2 | 6
  9. 6 | test2 | 6
  10. 7 | test2 | 6
  11. 8 | test3 | 2
  12. 100 | test3 | 2
  13. 1000 | test4 | 1
  14. (10 rows)

取分组的数据, 主要看test2 这个组一共有6行, 0.3*6=1.8, 所以它需要取第二行的数据.

  1. postgres=# select info,percentile_disc(0.3) within group (order by id) from test group by info;
  2. info | percentile_disc
  3. -------+-----------------
  4. test1 | 1
  5. test2 | 3
  6. test3 | 8
  7. test4 | 1000
  8. (4 rows)

注意

最终计算的是表达式的分布数, 而不是计算列值的分布数后再计算表达式.

验证如下 :

或者你可以看代码 :

  1. postgres=# select percentile_cont(0.5) within group (order by id^2),info from test group by info;
  2. percentile_cont | info
  3. -----------------+-------
  4. 1 | test1
  5. 20.5 | test2
  6. 5032 | test3
  7. 1000000 | test4
  8. (4 rows)
  9. postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;
  10. percentile_cont | info
  11. -----------------+-------
  12. 1 | test1
  13. 4.5 | test2
  14. 54 | test3
  15. 1000 | test4
  16. (4 rows)
  17. postgres=# select 4.5^2;
  18. ?column?
  19. ---------------------
  20. 20.2500000000000000
  21. (1 row)
  22. postgres=# select 54^2;
  23. ?column?
  24. ----------
  25. 2916
  26. (1 row)
Hypothetical-Set聚合函数
函数返回类型描述
rank()bigintrank of the current row with gaps; same as row_number of its first peer
dense_rank()bigintrank of the current row without gaps; this function counts peer groups
percent_rank()double precisionrelative rank of the current row: (rank - 1) / (total rows - 1)
cume_dist()double precisionrelative rank of the current row: (number of rows preceding or peer with current row) / (total rows)

the four ranking functions are defined so that they give the same answer for any two peer rows.

rank 返回值在分组内的等级, 如果值有重复的话, 跳级处理.

dense_rank 返回值在分组内的等级, 如果值有重复的话, 不跳级处理.

percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 即截至当前记录等级一共有多少行除以本组的总行数.

看一个例子比较明白.

  1. postgres=# select *,row_number() over(partition by info order by id),rank() over(partition by info order by id),dense_rank() over(partition by info order by id),percent_rank() over(partition by info order by id),cume_dist() over(partition by info order by id) from test;
  2. id | info | row_number | rank | dense_rank | percent_rank | cume_dist
  3. ------+-------+------------+------+------------+--------------+-------------------
  4. 1 | test1 | 1 | 1 | 1 | 0 | 1
  5. 2 | test2 | 1 | 1 | 1 | 0 | 0.444444444444444
  6. 2 | test2 | 2 | 1 | 1 | 0 | 0.444444444444444
  7. 2 | test2 | 3 | 1 | 1 | 0 | 0.444444444444444
  8. 2 | test2 | 4 | 1 | 1 | 0 | 0.444444444444444
  9. 3 | test2 | 5 | 5 | 2 | 0.5 | 0.555555555555556
  10. 4 | test2 | 6 | 6 | 3 | 0.625 | 0.666666666666667
  11. 5 | test2 | 7 | 7 | 4 | 0.75 | 0.777777777777778
  12. 6 | test2 | 8 | 8 | 5 | 0.875 | 0.888888888888889
  13. 7 | test2 | 9 | 9 | 6 | 1 | 1
  14. 8 | test3 | 1 | 1 | 1 | 0 | 0.5
  15. 100 | test3 | 2 | 2 | 2 | 1 | 1
  16. 1000 | test4 | 1 | 1 | 1 | 0 | 1
  17. (13 rows)

算法 :

以info='test2'这个组为例 :

  1. 2 | test2 | 1 | 1 | 1 | 0 | 0.444444444444444
  2. 2 | test2 | 2 | 1 | 1 | 0 | 0.444444444444444
  3. 2 | test2 | 3 | 1 | 1 | 0 | 0.444444444444444
  4. 2 | test2 | 4 | 1 | 1 | 0 | 0.444444444444444

id=2 的 rank和dense_rank都是1.

percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

所以

  1. percent_rank = (1-1)/(9-1)=0
  2. cume_dist = (4)/(9) = 0.444444444444444
    3 | test2 |          5 |    5 |          2 |          0.5 | 0.555555555555556    

rank = 5, 跳级

dense_rank = 2, 不跳级

  1. percent_rank = (5-1)/(9-1)=0.5
  2. cume_dist = (5)/(9) = 0.555555555555556

这些窗口函数的另一种用法, 聚合用法.

FunctionDirect Argument Type(s)Aggregated Argument Type(s)Return TypeDescription
rank(args) WITHIN GROUP (ORDER BY sorted_args)VARIADIC "any"VARIADIC "any"bigintrank of the hypothetical row, with gaps for duplicate rows
dense_rank(args) WITHIN GROUP (ORDER BY sorted_args)VARIADIC "any"VARIADIC "any"bigintrank of the hypothetical row, without gaps
percent_rank(args) WITHIN GROUP (ORDER BYsorted_args)VARIADIC "any"VARIADIC "any"double precisionrelative rank of the hypothetical row, ranging from 0 to 1
cume_dist(args) WITHIN GROUP (ORDER BY sorted_args)VARIADIC "any"VARIADIC "any"double precisionrelative rank of the hypothetical row, ranging from 1/N to 1

这些用法比较奇特, 其实是要返回给定参数在集合中的位置.

例如 :

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5

如果我们给一个参数值是2.2, 应该排在以上数据中的第三行.

例子 :

  1. postgres=# select * from test order by info,id;
  2. id | info
  3. ------+-------
  4. 1 | test1
  5. 2 | test2
  6. 2 | test2
  7. 2 | test2
  8. 2 | test2
  9. 3 | test2
  10. 4 | test2
  11. 5 | test2
  12. 6 | test2
  13. 7 | test2
  14. 8 | test3
  15. 100 | test3
  16. 1000 | test4
  17. (13 rows)
  18. postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id) from test group by info;
  19. info | rank | dense_rank
  20. -------+------+------------
  21. test1 | 2 | 2
  22. test2 | 7 | 4
  23. test3 | 1 | 1
  24. test4 | 1 | 1
  25. (4 rows)

4.9在test1这个分组, 排名第2, 并且这个分组只有1个值, 所以没有gap.

重点关注test2这个组, 这个组有9个值, 其中有4个重复值2, 所以4.9在这里排名需要考虑gap.

rank 返回7, 即4.9在这里考虑GAP排名第7

dense_rank 返回4, 即4.9在这里不考虑GAP排名第4.

又如 :

  1. postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id) from test group by info;
  2. info | rank | dense_rank
  3. -------+------+------------
  4. test1 | 2 | 2
  5. test2 | 7 | 4
  6. test3 | 1 | 1
  7. test4 | 1 | 1
  8. (4 rows)
  9. postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id) from test group by info;
  10. info | rank | dense_rank
  11. -------+------+------------
  12. test1 | 2 | 2
  13. test2 | 8 | 5
  14. test3 | 1 | 1
  15. test4 | 1 | 1
  16. (4 rows)

最后要看计算0~1代表位置的聚合函数percent_rank和cume_dist.

算法

percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

例子1 :

  1. postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id),percent_rank(4.9) within group (order by id),cume_dist(4.9) within group (order by id) from test group by info;
  2. info | rank | dense_rank | percent_rank | cume_dist
  3. -------+------+------------+-------------------+-------------------
  4. test1 | 2 | 2 | 1 | 1
  5. test2 | 7 | 4 | 0.666666666666667 | 0.7
  6. test3 | 1 | 1 | 0 | 0.333333333333333
  7. test4 | 1 | 1 | 0 | 0.5
  8. (4 rows)

同样以test2为分组, 讲解算法. 把4.9插入到这个分组后. 数据应该变成 :

  1. 2 | test2
  2. 2 | test2
  3. 2 | test2
  4. 2 | test2
  5. 3 | test2
  6. 4 | test2
  7. 4.9 | test2 # 计算位置
  8. 5 | test2
  9. 6 | test2
  10. 7 | test2

一共10行.

percent_rank 返回 (rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

所以4.9对应的percent_rank 和 cume_dist 分别为 :

  1. percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667
  2. cume_dist = (7)/10 = 0.7

例子2 :

  1. postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id),percent_rank(5) within group (order by id),cume_dist(5) within group (order by id) from test group by info;
  2. info | rank | dense_rank | percent_rank | cume_dist
  3. -------+------+------------+-------------------+-------------------
  4. test1 | 2 | 2 | 1 | 1
  5. test2 | 7 | 4 | 0.666666666666667 | 0.8
  6. test3 | 1 | 1 | 0 | 0.333333333333333
  7. test4 | 1 | 1 | 0 | 0.5
  8. (4 rows)

插入计算值5后, 数据变成

  1. 2 | test2
  2. 2 | test2
  3. 2 | test2
  4. 2 | test2
  5. 3 | test2
  6. 4 | test2
  7. 5 | test2 # 计算位置, 即参数值
  8. 5 | test2
  9. 6 | test2
  10. 7 | test2

依旧10行. 但是截至当前记录等级一共有多少行? 注意是8了.

  1. percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667
  2. cume_dist = (8)/10 = 0.8

例子3 :

  1. postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id),percent_rank(5.1) within group (order by id),cume_dist(5.1) within group (order by id) from test group by info;
  2. info | rank | dense_rank | percent_rank | cume_dist
  3. -------+------+------------+-------------------+-------------------
  4. test1 | 2 | 2 | 1 | 1
  5. test2 | 8 | 5 | 0.777777777777778 | 0.8
  6. test3 | 1 | 1 | 0 | 0.333333333333333
  7. test4 | 1 | 1 | 0 | 0.5
  8. (4 rows)

插入计算值5.1后, 数据变成 :

  1. 2 | test2
  2. 2 | test2
  3. 2 | test2
  4. 2 | test2
  5. 3 | test2
  6. 4 | test2
  7. 5 | test2
  8. 5.1 | test2 # 计算位置, 即参数值
  9. 6 | test2
  10. 7 | test2

例子4 :

  1. postgres=# select info,rank(5) within group (order by id desc),dense_rank(5) within group (order by id desc),percent_rank(5) within group (order by id desc),cume_dist(5) within group (order by id desc) from test group by info;
  2. info | rank | dense_rank | percent_rank | cume_dist
  3. -------+------+------------+-------------------+-----------
  4. test1 | 1 | 1 | 0 | 0.5
  5. test2 | 3 | 3 | 0.222222222222222 | 0.4
  6. test3 | 3 | 3 | 1 | 1
  7. test4 | 2 | 2 | 1 | 1
  8. (4 rows)

插入计算值5后, 数据变成 :

  1. 7 | test2
  2. 6 | test2
  3. 5 | test2 # 注意, 这才是计算位置, 即插入位置.
  4. 5 | test2
  5. 4 | test2
  6. 3 | test2
  7. 2 | test2
  8. 2 | test2
  9. 2 | test2
  10. 2 | test2
多维分析
  1. => SELECT * FROM items_sold;
  2. brand | size | sales
  3. -------+------+-------
  4. Foo | L | 10
  5. Foo | M | 20
  6. Bar | M | 15
  7. Bar | L | 5
  8. (4 rows)
  9. => SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());
  10. brand | size | sum
  11. -------+------+-----
  12. Foo | | 30
  13. Bar | | 20
  14. | L | 15
  15. | M | 35
  16. | | 50
  17. (5 rows)

其中GROUP BY GROUPING SETS ((brand), (size), ());

相当于以下三个group by的union all(未在分组的列以NULL代替。):

  1. group by brand
  2. group by size
  3. group by ()

分组集合除了可以用GROUPING SETS来指定,另外还提供了两个特殊的写法rollup和cube.

ROLLUP ( e1, e2, e3, ... )    

代表递减分组,一般用于异构结构的分组如国家,省份,城市,乡镇这样的结构查询。

逐级分组汇总结果,它相当于如下写法:

  1. GROUPING SETS (
  2. ( e1, e2, e3, ... ),
  3. ...
  4. ( e1, e2 )
  5. ( e1 )
  6. ( ) -- 注意包含全集
  7. )

还有一种写法是CUBE

CUBE ( a, b, c )    

cube是任意组合,相当于:

  1. GROUPING SETS (
  2. ( a, b, c ),
  3. ( a, b ),
  4. ( a, c ),
  5. ( a ),
  6. ( b, c ),
  7. ( b ),
  8. ( c ),
  9. ( ) -- 注意包含全集
  10. )

在cube和rollup中使用括号可以将多个表达式作为单个表达式来处理:

ROLLUP ( a, (b,c), d )    

递减,相当于

  1. GROUPING SETS (
  2. ( a, b, c, d ),
  3. ( a, b, c ),
  4. ( a ),
  5. ( )
  6. )
CUBE ( (a,b), (c,d) )    

相当于:

  1. GROUPING SETS (
  2. ( a, b, c, d ),
  3. ( a, b ),
  4. ( c, d ),
  5. ( )
  6. )

同时cube,rollup,grouping sets还可以混合使用:

GROUP BY a, CUBE(b,c), GROUPING SETS ((d), (e))    

相当于:

  1. GROUP BY GROUPING SETS (
  2. (a,b,c,d), (a,b,c,e),
  3. (a,b,d), (a,b,e),
  4. (a,c,d), (a,c,e),
  5. (a,d), (a,e)
  6. )

既然分组聚合相当于多个group by 的union all,那么我们如何区分当前输出的记录是哪个分组group by的输出呢?

用grouping(cols)可以表示未参与聚合的表达式的比特位,并转换为INT输出。

例如:

  1. => SELECT * FROM items_sold;
  2. make | model | sales
  3. -------+-------+-------
  4. Foo | GT | 10
  5. Foo | Tour | 20
  6. Bar | City | 15
  7. Bar | Sport | 5
  8. (4 rows)

grouping()中必须包含group by后面的任意或所有列。

  1. => SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);
  2. make | model | grouping | sum
  3. -------+-------+----------+-----
  4. Foo | GT | 0 | 10
  5. Foo | Tour | 0 | 20
  6. Bar | City | 0 | 15
  7. Bar | Sport | 0 | 5
  8. Foo | | 1 | 30
  9. Bar | | 1 | 20
  10. | | 3 | 50
  11. (7 rows)

grouping()中必须包含group by后面的任意或所有列,不能包含未参与聚合的列。

grouping()中的每个表达式用1个比特位表示。

  1. postgres=# create table tbl(c1 int,c2 int,c3 int, c4 int, c5 int);
  2. CREATE TABLE
  3. postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);
  4. c1 | count | grouping
  5. ----+-------+----------
  6. | 0 | 7 -- b'111'::int = 7
  7. (1 row)
  8. postgres=# select c1,count(*),grouping(c1,c2) from tbl group by cube(c1,c2,c3);
  9. c1 | count | grouping
  10. ----+-------+----------
  11. | 0 | 3 b'11'::int = 3
  12. (1 row)
  13. postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);
  14. c1 | count | grouping
  15. ----+-------+----------
  16. | 0 | 1 b'1'::int = 1
  17. (1 row)
  18. postgres=# insert into tbl values (1,2,3,4,5);
  19. INSERT 0 1
  20. postgres=# insert into tbl values (1,2,3,4,6);
  21. INSERT 0 1
  22. postgres=# insert into tbl values (2,3,4,5,6);
  23. INSERT 0 1
  24. postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);
  25. c1 | count | grouping
  26. ----+-------+----------
  27. 1 | 2 | 0
  28. 1 | 2 | 0
  29. 1 | 2 | 1
  30. 2 | 1 | 0
  31. 2 | 1 | 0
  32. 2 | 1 | 1
  33. | 3 | 1
  34. 1 | 2 | 1
  35. | 2 | 1
  36. 2 | 1 | 1
  37. | 1 | 1
  38. | 2 | 0
  39. | 2 | 0
  40. | 1 | 0
  41. | 1 | 0
  42. (15 rows)

grouping中包含未参与聚合的列将报错:

  1. postgres=# select c1,count(*),grouping(c4) from tbl group by cube(c1,c2,c3);
  2. ERROR: arguments to GROUPING must be grouping expressions of the associated query level
  3. LINE 1: select c1,count(*),grouping(c4) from tbl group by cube(c1,c2...
  4. ^
  5. postgres=# select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by cube(c1,c2,c3);
  6. ERROR: arguments to GROUPING must be grouping expressions of the associated query level
  7. LINE 1: select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by c...
  8. ^
  9. postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);
  10. c1 | count | grouping
  11. ----+-------+----------
  12. 1 | 2 | 0
  13. 1 | 2 | 1
  14. 1 | 2 | 3
  15. 2 | 1 | 0
  16. 2 | 1 | 1
  17. 2 | 1 | 3
  18. | 3 | 7
  19. 1 | 2 | 2
  20. | 2 | 6
  21. 2 | 1 | 2
  22. | 1 | 6
  23. | 2 | 4
  24. | 2 | 5
  25. | 1 | 4
  26. | 1 | 5
  27. (15 rows)

2. 子查询

select子查询只能返回一列

  1. postgres=# select (select * from (values (1,2),(2,3)) as t(c1,c2)) , relname, relkind from pg_class;
  2. ERROR: subquery must return only one column
  3. LINE 1: select (select * from (values (1,2),(2,3)) as t(c1,c2)) , re...
  4. ^

select子查询只能返回一条记录

  1. postgres=# select (select * from (values (1),(2)) as t(c1)) , relname, relkind from pg_class;
  2. ERROR: more than one row returned by a subquery used as an expression

子查询可以用在select子句也可以用在源中,还可以用在with, update from语句

  1. postgres=# select (select * from (values (1),(2)) as t(c1) limit 1) , relname, relkind from pg_class;
  2. c1 | relname | relkind
  3. ----+-----------------------------------------------+---------
  4. 1 | pg_type | r
  5. 1 | pg_toast_187550 | t
  6. 1 | new_type | c
  7. 1 | pg_toast_187550_index | i
  8. 1 | test | r
  9. 1 | pg_toast_187556 | t
  10. postgres=# select t.relname from (select * from pg_class limit 1) t , pg_class where t.relname=pg_class.relname;
  11. relname
  12. ---------
  13. pg_type
  14. (1 row)

3. union\union all\except\intersect

两条或者多条查询结果的合并、去重合并、相交、求差。

union all(不去重)

  1. postgres=# select * from (values (1),(1)) t(id) union all select * from (values (2),(2)) t(id);
  2. id
  3. ----
  4. 1
  5. 1
  6. 2
  7. 2
  8. (4 rows)

union(去重)

  1. postgres=# select * from (values (1),(1)) t(id) union select * from (values (2),(2)) t(id);
  2. id
  3. ----
  4. 1
  5. 2
  6. (2 rows)

except(去重)

  1. postgres=# select * from (values (1),(1),(2)) t(id) except select * from (values (2),(2)) t(id);
  2. id
  3. ----
  4. 1
  5. (1 row)

intersect(去重)

  1. postgres=# select * from (values (1),(1),(2)) t(id) intersect select * from (values (2),(2)) t(id);
  2. id
  3. ----
  4. 2
  5. (1 row)

4. 自连接

当某个表或者子句的A字段要和B字段进行关联时,可以使用自关联。

  1. postgres=# create table tbl8(c1 int, c2 int, info text);
  2. CREATE TABLE
  3. postgres=# insert into tbl8 select generate_series(1,100), generate_series(2,101),md5(random()::text) ;
  4. INSERT 0 100
  5. postgres=# select t1.* from tbl8 t1, tbl8 t2 where t1.c1=t2.c2 and t2.c2<10 limit 10;
  6. c1 | c2 | info
  7. ----+----+----------------------------------
  8. 2 | 3 | b54fbfd843c2343330ceaa1758882ee4
  9. 3 | 4 | df999680d87435f2d2d8d5604aca2f1e
  10. 4 | 5 | 497ad2cfa8acd1f7062bb1bd7aa7a646
  11. 5 | 6 | ef55bcdc85dfa8d54978e4c2085ec55a
  12. 6 | 7 | 45f60117d6d577389707b22823a513e5
  13. 7 | 8 | ebde16ccaeced9a400a1608d591a2bf0
  14. 8 | 9 | fdeb1505dd02aca33abed20531592302
  15. 9 | 10 | 010d07f3b9b05b2c36b46440b1dd92aa
  16. (8 rows)

5. 内连接

内连接,仅仅输出符合连接条件的记录。

  1. digoal=# create table tbl_join_1(id int primary key,info text);
  2. CREATE TABLE
  3. digoal=# create table tbl_join_2(id int primary key,info text);
  4. CREATE TABLE
  5. digoal=# create table tbl_join_3(id int primary key,info text);
  6. CREATE TABLE
  7. digoal=# create table tbl_join_4(id int primary key,info text);
  8. CREATE TABLE
  9. digoal=# create table tbl_join_5(id int primary key,info text);
  10. CREATE TABLE
  11. digoal=# create table tbl_join_6(id int primary key,info text);
  12. CREATE TABLE
  13. digoal=# create table tbl_join_7(id int primary key,info text);
  14. CREATE TABLE
  15. digoal=# create table tbl_join_8(id int primary key,info text);
  16. CREATE TABLE
  17. digoal=# create table tbl_join_9(id int primary key,info text);
  18. CREATE TABLE
  19. digoal=# insert into tbl_join_1 select generate_series(1,10),md5(random()::text);
  20. digoal=# insert into tbl_join_2 select generate_series(1,100),md5(random()::text);
  21. digoal=# insert into tbl_join_3 select generate_series(1,1000),md5(random()::text);
  22. digoal=# insert into tbl_join_4 select generate_series(1,10000),md5(random()::text);
  23. digoal=# insert into tbl_join_5 select generate_series(1,100000),md5(random()::text);
  24. digoal=# insert into tbl_join_6 select generate_series(1,1000000),md5(random()::text);
  25. digoal=# insert into tbl_join_7 select generate_series(1,2000000),md5(random()::text);
  26. digoal=# insert into tbl_join_8 select generate_series(1,3000000),md5(random()::text);
  27. digoal=# insert into tbl_join_9 select generate_series(1,4000000),md5(random()::text);

非显示JOIN

  1. postgres=# select t1.* from tbl_join_1 t1, tbl_join_2 t2 where t1.id=t2.id and t2.id=1;
  2. id | info
  3. ----+----------------------------------
  4. 1 | 5b6049ec7b94d17c0bd92dc4da436311
  5. (1 row)

显示JOIN

  1. postgres=# select t1.* from tbl_join_1 t1 join tbl_join_2 t2 on(t1.id=t2.id and t2.id=1);
  2. id | info
  3. ----+----------------------------------
  4. 1 | 5b6049ec7b94d17c0bd92dc4da436311
  5. (1 row)
优化器如何强制连接顺序?

当join_collapse_limit=1时, 显示JOIN的关联按SQL写法进行关联. 例如 :

以下写法非显示关联,无法强制JOIN顺序

  1. digoal=# set join_collapse_limit=1;
  2. digoal=# explain select t1.info, t5.info from tbl_join_1 t1,
  3. tbl_join_2 t2,
  4. tbl_join_3 t3,
  5. tbl_join_4 t4,
  6. tbl_join_5 t5,
  7. tbl_join_6 t6,
  8. tbl_join_7 t7,
  9. tbl_join_8 t8,
  10. tbl_join_9 t9
  11. where
  12. t1.id=t2.id and
  13. t2.id=t3.id and
  14. t3.id=t4.id and
  15. t4.id=t5.id and
  16. t5.id=t6.id and
  17. t6.id=t7.id and
  18. t7.id=t8.id and
  19. t8.id=t9.id and
  20. t9.id=10000;
  21. QUERY PLAN
  22. --------------------------------------------------------------------------------------------------------------------------------------
  23. Nested Loop (cost=2.72..25.99 rows=1 width=65)
  24. -> Nested Loop (cost=2.29..22.93 rows=1 width=69)
  25. -> Nested Loop (cost=1.86..19.87 rows=1 width=69)
  26. -> Nested Loop (cost=1.43..16.82 rows=1 width=69)
  27. -> Nested Loop (cost=1.00..13.76 rows=1 width=69)
  28. -> Nested Loop (cost=0.71..10.84 rows=1 width=36)
  29. -> Nested Loop (cost=0.43..7.93 rows=1 width=36)
  30. -> Nested Loop (cost=0.15..5.03 rows=1 width=36)
  31. -> Index Scan using tbl_join_1_pkey on tbl_join_1 t1 (cost=0.15..2.77 rows=1 width=36)
  32. Index Cond: (id = 10000)
  33. -> Seq Scan on tbl_join_2 t2 (cost=0.00..2.25 rows=1 width=4)
  34. Filter: (id = 10000)
  35. -> Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3 (cost=0.28..2.89 rows=1 width=4)
  36. Index Cond: (id = 10000)
  37. -> Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4 (cost=0.29..2.90 rows=1 width=4)
  38. Index Cond: (id = 10000)
  39. -> Index Scan using tbl_join_5_pkey on tbl_join_5 t5 (cost=0.29..2.91 rows=1 width=37)
  40. Index Cond: (id = 10000)
  41. -> Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6 (cost=0.42..3.04 rows=1 width=4)
  42. Index Cond: (id = 10000)
  43. -> Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7 (cost=0.43..3.04 rows=1 width=4)
  44. Index Cond: (id = 10000)
  45. -> Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8 (cost=0.43..3.05 rows=1 width=4)
  46. Index Cond: (id = 10000)
  47. -> Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9 (cost=0.43..3.05 rows=1 width=4)
  48. Index Cond: (id = 10000)
  49. (26 rows)
  50. Time: 8.398 ms

以下写法显示关联,可以强制JOIN顺序

  1. digoal=# set join_collapse_limit=1;
  2. digoal=# explain select t1.info, t5.info from
  3. tbl_join_1 t1 join tbl_join_2 t2 on (t1.id=t2.id)
  4. join tbl_join_3 t3 on (t2.id=t3.id)
  5. join tbl_join_4 t4 on (t3.id=t4.id)
  6. join tbl_join_5 t5 on (t4.id=t5.id)
  7. join tbl_join_6 t6 on (t5.id=t6.id)
  8. join tbl_join_7 t7 on (t6.id=t7.id)
  9. join tbl_join_8 t8 on (t7.id=t8.id)
  10. join tbl_join_9 t9 on (t8.id=t9.id)
  11. where t9.id=10000;
  12. QUERY PLAN
  13. --------------------------------------------------------------------------------------------------------------------------------------
  14. Nested Loop (cost=2.72..25.99 rows=1 width=65)
  15. -> Nested Loop (cost=2.29..22.93 rows=1 width=69)
  16. -> Nested Loop (cost=1.86..19.87 rows=1 width=69)
  17. -> Nested Loop (cost=1.43..16.82 rows=1 width=69)
  18. -> Nested Loop (cost=1.00..13.76 rows=1 width=69)
  19. -> Nested Loop (cost=0.71..10.84 rows=1 width=36)
  20. -> Nested Loop (cost=0.43..7.93 rows=1 width=36)
  21. -> Nested Loop (cost=0.15..5.03 rows=1 width=36)
  22. -> Index Scan using tbl_join_1_pkey on tbl_join_1 t1 (cost=0.15..2.77 rows=1 width=36)
  23. Index Cond: (id = 10000)
  24. -> Seq Scan on tbl_join_2 t2 (cost=0.00..2.25 rows=1 width=4)
  25. Filter: (id = 10000)
  26. -> Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3 (cost=0.28..2.89 rows=1 width=4)
  27. Index Cond: (id = 10000)
  28. -> Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4 (cost=0.29..2.90 rows=1 width=4)
  29. Index Cond: (id = 10000)
  30. -> Index Scan using tbl_join_5_pkey on tbl_join_5 t5 (cost=0.29..2.91 rows=1 width=37)
  31. Index Cond: (id = 10000)
  32. -> Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6 (cost=0.42..3.04 rows=1 width=4)
  33. Index Cond: (id = 10000)
  34. -> Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7 (cost=0.43..3.04 rows=1 width=4)
  35. Index Cond: (id = 10000)
  36. -> Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8 (cost=0.43..3.05 rows=1 width=4)
  37. Index Cond: (id = 10000)
  38. -> Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9 (cost=0.43..3.05 rows=1 width=4)
  39. Index Cond: (id = 10000)
  40. (26 rows)
  41. Time: 0.829 ms

设置join_collapse_limit=1后, 按照SQL写法进行关联. 执行计划的时间也缩短了.

6. 外连接

左外连接

左边的表,即使没有匹配,也输出

  1. postgres=# create table tab1(id int, info text, crt_time timestamp);
  2. CREATE TABLE
  3. Time: 26.056 ms
  4. postgres=# create table tab2(id int, info text, crt_time timestamp);
  5. CREATE TABLE
  6. Time: 36.215 ms
  7. postgres=# insert into tab1 values (1,'test',now());
  8. INSERT 0 1
  9. Time: 0.520 ms
  10. postgres=# insert into tab1 values (2,'test',now());
  11. INSERT 0 1
  12. Time: 0.297 ms
  13. postgres=# insert into tab2 values (2,'test',now());
  14. INSERT 0 1
  15. Time: 11.325 ms
  16. postgres=# insert into tab2 values (3,'test',now());
  17. INSERT 0 1
  18. Time: 0.352 ms
  19. postgres=# select * from tab1;
  20. id | info | crt_time
  21. ----+------+----------------------------
  22. 1 | test | 2017-04-11 17:48:29.37083
  23. 2 | test | 2017-04-11 17:48:32.742795
  24. (2 rows)
  25. Time: 0.506 ms
  26. postgres=# select * from tab2;
  27. id | info | crt_time
  28. ----+------+----------------------------
  29. 2 | test | 2017-04-11 17:48:39.722821
  30. 3 | test | 2017-04-11 17:48:41.901834
  31. (2 rows)
  32. Time: 0.335 ms

右表没有被匹配时,输出NULL

  1. postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id);
  2. id | info | crt_time | id | info | crt_time
  3. ----+------+----------------------------+----+------+----------------------------
  4. 1 | test | 2017-04-11 17:48:29.37083 | | |
  5. 2 | test | 2017-04-11 17:48:32.742795 | 2 | test | 2017-04-11 17:48:39.722821
  6. (2 rows)

过滤在A表但是不在B表的记录。

  1. postgres=# select tab1.* from tab1 left join tab2 on (tab1.id=tab2.id) where tab2.* is null;
  2. id | info | crt_time
  3. ----+------+---------------------------
  4. 1 | test | 2017-04-11 17:48:29.37083
  5. (1 row)
  6. postgres=# select * from tab1 where id in (select id from tab1 except select id from tab2);
  7. id | info | crt_time
  8. ----+------+---------------------------
  9. 1 | test | 2017-04-11 17:48:29.37083
  10. (1 row)

通过外部where filter,可以过滤连接=false的记录

  1. postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;
  2. id | info | crt_time | id | info | crt_time
  3. ----+------+---------------------------+----+------+----------
  4. 1 | test | 2017-04-11 17:48:29.37083 | | |
  5. (1 row)
  6. postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;
  7. QUERY PLAN
  8. ------------------------------------------------------------------
  9. Nested Loop Left Join (cost=0.00..48.81 rows=36 width=88)
  10. Join Filter: (tab1.id = tab2.id)
  11. -> Seq Scan on tab1 (cost=0.00..24.12 rows=6 width=44)
  12. Filter: (id = 1)
  13. -> Materialize (cost=0.00..24.16 rows=6 width=44)
  14. -> Seq Scan on tab2 (cost=0.00..24.12 rows=6 width=44)
  15. Filter: (id = 1)
  16. (7 rows)

join内的条件,不会过滤未连接的记录

  1. postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);
  2. id | info | crt_time | id | info | crt_time
  3. ----+------+----------------------------+----+------+----------
  4. 1 | test | 2017-04-11 17:48:29.37083 | | |
  5. 2 | test | 2017-04-11 17:48:32.742795 | | |
  6. (2 rows)
  7. postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);
  8. QUERY PLAN
  9. ---------------------------------------------------------------------
  10. Hash Left Join (cost=35.42..298.96 rows=1130 width=88)
  11. Hash Cond: (tab1.id = tab2.id)
  12. Join Filter: (tab1.id = 1)
  13. -> Seq Scan on tab1 (cost=0.00..21.30 rows=1130 width=44)
  14. -> Hash (cost=21.30..21.30 rows=1130 width=44)
  15. -> Seq Scan on tab2 (cost=0.00..21.30 rows=1130 width=44)
  16. (6 rows)
右外连接

左表没有被匹配时,输出NULL

  1. postgres=# select * from tab1 right join tab2 on (tab1.id=tab2.id);
  2. id | info | crt_time | id | info | crt_time
  3. ----+------+----------------------------+----+------+----------------------------
  4. 2 | test | 2017-04-11 17:48:32.742795 | 2 | test | 2017-04-11 17:48:39.722821
  5. | | | 3 | test | 2017-04-11 17:48:41.901834
  6. (2 rows)
全外连接

没有被匹配时,输出NULL

  1. postgres=# select * from tab1 full join tab2 on (tab1.id=tab2.id);
  2. id | info | crt_time | id | info | crt_time
  3. ----+------+----------------------------+----+------+----------------------------
  4. 1 | test | 2017-04-11 17:48:29.37083 | | |
  5. 2 | test | 2017-04-11 17:48:32.742795 | 2 | test | 2017-04-11 17:48:39.722821
  6. | | | 3 | test | 2017-04-11 17:48:41.901834
  7. (3 rows)

7. 窗口查询

https://www.postgresql.org/docs/9.6/static/functions-window.html

常用窗口函数

FunctionReturn TypeDescription
row_number()bigintnumber of the current row within its partition, counting from 1
rank()bigintrank of the current row with gaps; same as row_number of its first peer
dense_rank()bigintrank of the current row without gaps; this function counts peer groups
percent_rank()double precisionrelative rank of the current row: (rank - 1) / (total rows - 1)
cume_dist()double precisionrelative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
ntile(num_buckets integer)integerinteger ranging from 1 to the argument value, dividing the partition as equally as possible
lag(value anyelement [, offset integer [, default anyelement ]])same type as valuereturns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
lead(value anyelement [, offset integer [, default anyelement ]])same type as valuereturns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
first_value(value any)same type as valuereturns value evaluated at the row that is the first row of the window frame
last_value(value any)same type as valuereturns value evaluated at the row that is the last row of the window frame
nth_value(value any, nth integer)same type as valuereturns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

其他函数也可用在窗口语法中

例子

  1. CREATE TABLE window_test(id int, name text, subject text, score numeric);
  2. INSERT INTO window_test VALUES(1,'digoal','数学',99.5),
  3. (2,'digoal','语文',89.5),
  4. (3,'digoal','英语',79.5),
  5. (4,'digoal','物理',99.5),
  6. (5,'digoal','化学',98.5),
  7. (6,'刘德华','数学',89.5),
  8. (7,'刘德华','语文',99.5),
  9. (8,'刘德华','英语',79.5),
  10. (9,'刘德华','物理',89.5),
  11. (10,'刘德华','化学',69.5),
  12. (11,'张学友','数学',89.5),
  13. (12,'张学友','语文',91.5),
  14. (13,'张学友','英语',92.5),
  15. (14,'张学友','物理',93.5),
  16. (15,'张学友','化学',94.5);

-- 取出每门课程的第一名.

  1. SELECT id,name,subject,score FROM
  2. (SELECT row_number() OVER (PARTITION BY subject ORDER BY score DESC) AS rn,* FROM window_test) AS t
  3. WHERE rn=1 ORDER BY SUBJECT;
  4. id | name | subject | score
  5. ----+--------+---------+-------
  6. 5 | digoal | 化学 | 98.5
  7. 1 | digoal | 数学 | 99.5
  8. 4 | digoal | 物理 | 99.5
  9. 13 | 张学友 | 英语 | 92.5
  10. 7 | 刘德华 | 语文 | 99.5

每个人每门成绩与第一名的差距,第一名,排名

  1. postgres=# SELECT row_number() over(partition by subject order by score desc) as ord,
  2. id,name,subject,score,max(score) over (partition by subject) as top1,
  3. max(score) over (partition by subject) - score as diff from window_test ;
  4. ord | id | name | subject | score | top1 | diff
  5. -----+----+--------+---------+-------+------+------
  6. 1 | 5 | digoal | 化学 | 98.5 | 98.5 | 0.0
  7. 2 | 15 | 张学友 | 化学 | 94.5 | 98.5 | 4.0
  8. 3 | 10 | 刘德华 | 化学 | 69.5 | 98.5 | 29.0
  9. 1 | 1 | digoal | 数学 | 99.5 | 99.5 | 0.0
  10. 2 | 11 | 张学友 | 数学 | 89.5 | 99.5 | 10.0
  11. 3 | 6 | 刘德华 | 数学 | 89.5 | 99.5 | 10.0
  12. 1 | 4 | digoal | 物理 | 99.5 | 99.5 | 0.0
  13. 2 | 14 | 张学友 | 物理 | 93.5 | 99.5 | 6.0
  14. 3 | 9 | 刘德华 | 物理 | 89.5 | 99.5 | 10.0
  15. 1 | 13 | 张学友 | 英语 | 92.5 | 92.5 | 0.0
  16. 2 | 3 | digoal | 英语 | 79.5 | 92.5 | 13.0
  17. 3 | 8 | 刘德华 | 英语 | 79.5 | 92.5 | 13.0
  18. 1 | 7 | 刘德华 | 语文 | 99.5 | 99.5 | 0.0
  19. 2 | 12 | 张学友 | 语文 | 91.5 | 99.5 | 8.0
  20. 3 | 2 | digoal | 语文 | 89.5 | 99.5 | 10.0
  21. (15 rows)
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/492978
推荐阅读
相关标签
  

闽ICP备14008679号