赞
踩
https://yq.aliyun.com/articles/147?spm=a2c4e.11153940.blogcont148.23.1526746cjd2A4Y
摘要: PostgreSQL自带了一些常用的统计学聚合函数, 非常好用. 本文介绍一下方差和标准差的一些聚合函数. 总体方差 : population covariance 总体标准差 : population standard deviation 样本方差 : sample covariance 样本...
- PostgreSQL自带了一些常用的统计学聚合函数, 非常好用.
- 本文介绍一下方差和标准差的一些聚合函数.
-
-
- 总体方差 : population covariance
- 总体标准差 : population standard deviation
- 样本方差 : sample covariance
- 样本标准差 : sample standard deviation
- 均值 : mean
-
- 样本均值和样本方差的介绍 :
- http://en.wikipedia.org/wiki/Sample_mean_and_sample_covariance
- 均值介绍 :
- http://en.wikipedia.org/wiki/Mean
-
- 对方差, 标准差, 均值不了解的话, 建议参考网易公开课, 统计学.
- 浅显易懂.
- http://v.163.com/special/Khan/khstatistics.html
- http://v.ku6.com/playlist/index_6598382.html
-
- PostgreSQL计算方差, 标准差的聚合函数如下 :
- http://www.postgresql.org/docs/devel/static/functions-aggregate.html
-
- 其中stddev和variance是stddev_samp和var_samp的别名.
- 这些函数用于计算数据集的总体/样本 方差,总体/样本 标准差.
- 例如 :
- 1,2,3,100 这组数据共4个值, 总体均值和样本均值分别为 :
- (1+2+3+100)/4 = 26.5
- 总体方差 : ((1-26.5)^2 + (2-26.5)^2 + (3-26.5)^2 + (100-26.5)^2)/4 = 1801.25
- 样本方差 : ((1-26.5)^2 + (2-26.5)^2 + (3-26.5)^2 + (100-26.5)^2)/(4-1) = 2401.6666....
- 总体标准差 : 平方根(总体方差) = 42.4411357058220109
- 样本标准差 : 平方根(样本方差) = 49.0068022489395513
- 使用PostgreSQL计算如下 :
- postgres=# select variance(id) from (values(1),(2),(3),(100)) as t(id);
- variance
- -----------------------
- 2401.6666666666666667
- (1 row)
- postgres=# select var_pop(id) from (values(1),(2),(3),(100)) as t(id);
- var_pop
- -----------------------
- 1801.2500000000000000
- (1 row)
- postgres=# select var_samp(id) from (values(1),(2),(3),(100)) as t(id);
- var_samp
- -----------------------
- 2401.6666666666666667
- (1 row)
- postgres=# select stddev(id) from (values(1),(2),(3),(100)) as t(id);
- stddev
- ---------------------
- 49.0068022489395513
- (1 row)
- postgres=# select stddev_pop(id) from (values(1),(2),(3),(100)) as t(id);
- stddev_pop
- ---------------------
- 42.4411357058220109
- (1 row)
- postgres=# select stddev_samp(id) from (values(1),(2),(3),(100)) as t(id);
- stddev_samp
- ---------------------
- 49.0068022489395513
- (1 row)
-
- [参考]
- 1. src/backend/utils/adt/float.c
- /*
- * =========================
- * FLOAT AGGREGATE OPERATORS
- * =========================
- *
- * float8_accum - accumulate for AVG(), variance aggregates, etc.
- * float4_accum - same, but input data is float4
- * float8_avg - produce final result for float AVG()
- * float8_var_samp - produce final result for float VAR_SAMP()
- * float8_var_pop - produce final result for float VAR_POP()
- * float8_stddev_samp - produce final result for float STDDEV_SAMP()
- * float8_stddev_pop - produce final result for float STDDEV_POP()
- *
- * The transition datatype for all these aggregates is a 3-element array
- * of float8, holding the values N, sum(X), sum(X*X) in that order.
- *
- * Note that we represent N as a float to avoid having to build a special
- * datatype. Given a reasonable floating-point implementation, there should
- * be no accuracy loss unless N exceeds 2 ^ 52 or so (by which time the
- * user will have doubtless lost interest anyway...)
- */
- ..................
- Datum
- float8_var_pop(PG_FUNCTION_ARGS)
- {
- ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
- float8 *transvalues;
- float8 N,
- sumX,
- sumX2,
- numerator;
-
- transvalues = check_float8_array(transarray, "float8_var_pop", 3);
- N = transvalues[0];
- sumX = transvalues[1];
- sumX2 = transvalues[2];
-
- /* Population variance is undefined when N is 0, so return NULL */
- if (N == 0.0)
- PG_RETURN_NULL();
-
- numerator = N * sumX2 - sumX * sumX;
- CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
-
- /* Watch out for roundoff error producing a negative numerator */
- if (numerator <= 0.0)
- PG_RETURN_FLOAT8(0.0);
-
- PG_RETURN_FLOAT8(numerator / (N * N));
- }
-
- Datum
- float8_var_samp(PG_FUNCTION_ARGS)
- {
- ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
- float8 *transvalues;
- float8 N,
- sumX,
- sumX2,
- numerator;
-
- transvalues = check_float8_array(transarray, "float8_var_samp", 3);
- N = transvalues[0];
- sumX = transvalues[1];
- sumX2 = transvalues[2];
-
- /* Sample variance is undefined when N is 0 or 1, so return NULL */
- if (N <= 1.0)
- PG_RETURN_NULL();
-
- numerator = N * sumX2 - sumX * sumX;
- CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
-
- /* Watch out for roundoff error producing a negative numerator */
- if (numerator <= 0.0)
- PG_RETURN_FLOAT8(0.0);
-
- PG_RETURN_FLOAT8(numerator / (N * (N - 1.0)));
- }
-
- Datum
- float8_stddev_pop(PG_FUNCTION_ARGS)
- {
- ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
- float8 *transvalues;
- float8 N,
- sumX,
- sumX2,
- numerator;
-
- transvalues = check_float8_array(transarray, "float8_stddev_pop", 3);
- N = transvalues[0];
- sumX = transvalues[1];
- sumX2 = transvalues[2];
-
- /* Population stddev is undefined when N is 0, so return NULL */
- if (N == 0.0)
- PG_RETURN_NULL();
-
- numerator = N * sumX2 - sumX * sumX;
- CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
-
- /* Watch out for roundoff error producing a negative numerator */
- if (numerator <= 0.0)
- PG_RETURN_FLOAT8(0.0);
-
- PG_RETURN_FLOAT8(sqrt(numerator / (N * N)));
- }
-
- Datum
- float8_stddev_samp(PG_FUNCTION_ARGS)
- {
- ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
- float8 *transvalues;
- float8 N,
- sumX,
- sumX2,
- numerator;
-
- transvalues = check_float8_array(transarray, "float8_stddev_samp", 3);
- N = transvalues[0];
- sumX = transvalues[1];
- sumX2 = transvalues[2];
-
- /* Sample stddev is undefined when N is 0 or 1, so return NULL */
- if (N <= 1.0)
- PG_RETURN_NULL();
-
- numerator = N * sumX2 - sumX * sumX;
- CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
-
- /* Watch out for roundoff error producing a negative numerator */
- if (numerator <= 0.0)
- PG_RETURN_FLOAT8(0.0);
-
- PG_RETURN_FLOAT8(sqrt(numerator / (N * (N - 1.0))));
- }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。