当前位置:   article > 正文

PostgreSQL 聚合函数讲解 - 3 总体|样本 方差, 标准方差_pg数据库求方差函数

pg数据库求方差函数

https://yq.aliyun.com/articles/147?spm=a2c4e.11153940.blogcont148.23.1526746cjd2A4Y

摘要: PostgreSQL自带了一些常用的统计学聚合函数, 非常好用. 本文介绍一下方差和标准差的一些聚合函数. 总体方差 : population covariance 总体标准差 : population standard deviation 样本方差 : sample covariance 样本...

  1. PostgreSQL自带了一些常用的统计学聚合函数, 非常好用.
  2. 本文介绍一下方差和标准差的一些聚合函数.
  3. 总体方差 : population covariance
  4. 总体标准差 : population standard deviation
  5. 样本方差 : sample covariance
  6. 样本标准差 : sample standard deviation
  7. 均值 : mean
  8. 样本均值和样本方差的介绍 :
  9. http://en.wikipedia.org/wiki/Sample_mean_and_sample_covariance
  10. 均值介绍 :
  11. http://en.wikipedia.org/wiki/Mean
  12. 对方差, 标准差, 均值不了解的话, 建议参考网易公开课, 统计学.
  13. 浅显易懂.
  14. http://v.163.com/special/Khan/khstatistics.html
  15. http://v.ku6.com/playlist/index_6598382.html
  16. PostgreSQL计算方差, 标准差的聚合函数如下 :
  17. http://www.postgresql.org/docs/devel/static/functions-aggregate.html
  18. 其中stddev和variance是stddev_samp和var_samp的别名.
  19. 这些函数用于计算数据集的总体/样本 方差,总体/样本 标准差.
  20. 例如 :
  21. 1,2,3,100 这组数据共4个值, 总体均值和样本均值分别为 :
  22. (1+2+3+100)/4 = 26.5
  23. 总体方差 : ((1-26.5)^2 + (2-26.5)^2 + (3-26.5)^2 + (100-26.5)^2)/4 = 1801.25
  24. 样本方差 : ((1-26.5)^2 + (2-26.5)^2 + (3-26.5)^2 + (100-26.5)^2)/(4-1) = 2401.6666....
  25. 总体标准差 : 平方根(总体方差) = 42.4411357058220109
  26. 样本标准差 : 平方根(样本方差) = 49.0068022489395513
  27. 使用PostgreSQL计算如下 :
  28. postgres=# select variance(id) from (values(1),(2),(3),(100)) as t(id);
  29. variance
  30. -----------------------
  31. 2401.6666666666666667
  32. (1 row)
  33. postgres=# select var_pop(id) from (values(1),(2),(3),(100)) as t(id);
  34. var_pop
  35. -----------------------
  36. 1801.2500000000000000
  37. (1 row)
  38. postgres=# select var_samp(id) from (values(1),(2),(3),(100)) as t(id);
  39. var_samp
  40. -----------------------
  41. 2401.6666666666666667
  42. (1 row)
  43. postgres=# select stddev(id) from (values(1),(2),(3),(100)) as t(id);
  44. stddev
  45. ---------------------
  46. 49.0068022489395513
  47. (1 row)
  48. postgres=# select stddev_pop(id) from (values(1),(2),(3),(100)) as t(id);
  49. stddev_pop
  50. ---------------------
  51. 42.4411357058220109
  52. (1 row)
  53. postgres=# select stddev_samp(id) from (values(1),(2),(3),(100)) as t(id);
  54. stddev_samp
  55. ---------------------
  56. 49.0068022489395513
  57. (1 row)
  58. [参考]
  59. 1. src/backend/utils/adt/float.c
  60. /*
  61. * =========================
  62. * FLOAT AGGREGATE OPERATORS
  63. * =========================
  64. *
  65. * float8_accum - accumulate for AVG(), variance aggregates, etc.
  66. * float4_accum - same, but input data is float4
  67. * float8_avg - produce final result for float AVG()
  68. * float8_var_samp - produce final result for float VAR_SAMP()
  69. * float8_var_pop - produce final result for float VAR_POP()
  70. * float8_stddev_samp - produce final result for float STDDEV_SAMP()
  71. * float8_stddev_pop - produce final result for float STDDEV_POP()
  72. *
  73. * The transition datatype for all these aggregates is a 3-element array
  74. * of float8, holding the values N, sum(X), sum(X*X) in that order.
  75. *
  76. * Note that we represent N as a float to avoid having to build a special
  77. * datatype. Given a reasonable floating-point implementation, there should
  78. * be no accuracy loss unless N exceeds 2 ^ 52 or so (by which time the
  79. * user will have doubtless lost interest anyway...)
  80. */
  81. ..................
  82. Datum
  83. float8_var_pop(PG_FUNCTION_ARGS)
  84. {
  85. ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
  86. float8 *transvalues;
  87. float8 N,
  88. sumX,
  89. sumX2,
  90. numerator;
  91. transvalues = check_float8_array(transarray, "float8_var_pop", 3);
  92. N = transvalues[0];
  93. sumX = transvalues[1];
  94. sumX2 = transvalues[2];
  95. /* Population variance is undefined when N is 0, so return NULL */
  96. if (N == 0.0)
  97. PG_RETURN_NULL();
  98. numerator = N * sumX2 - sumX * sumX;
  99. CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
  100. /* Watch out for roundoff error producing a negative numerator */
  101. if (numerator <= 0.0)
  102. PG_RETURN_FLOAT8(0.0);
  103. PG_RETURN_FLOAT8(numerator / (N * N));
  104. }
  105. Datum
  106. float8_var_samp(PG_FUNCTION_ARGS)
  107. {
  108. ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
  109. float8 *transvalues;
  110. float8 N,
  111. sumX,
  112. sumX2,
  113. numerator;
  114. transvalues = check_float8_array(transarray, "float8_var_samp", 3);
  115. N = transvalues[0];
  116. sumX = transvalues[1];
  117. sumX2 = transvalues[2];
  118. /* Sample variance is undefined when N is 0 or 1, so return NULL */
  119. if (N <= 1.0)
  120. PG_RETURN_NULL();
  121. numerator = N * sumX2 - sumX * sumX;
  122. CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
  123. /* Watch out for roundoff error producing a negative numerator */
  124. if (numerator <= 0.0)
  125. PG_RETURN_FLOAT8(0.0);
  126. PG_RETURN_FLOAT8(numerator / (N * (N - 1.0)));
  127. }
  128. Datum
  129. float8_stddev_pop(PG_FUNCTION_ARGS)
  130. {
  131. ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
  132. float8 *transvalues;
  133. float8 N,
  134. sumX,
  135. sumX2,
  136. numerator;
  137. transvalues = check_float8_array(transarray, "float8_stddev_pop", 3);
  138. N = transvalues[0];
  139. sumX = transvalues[1];
  140. sumX2 = transvalues[2];
  141. /* Population stddev is undefined when N is 0, so return NULL */
  142. if (N == 0.0)
  143. PG_RETURN_NULL();
  144. numerator = N * sumX2 - sumX * sumX;
  145. CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
  146. /* Watch out for roundoff error producing a negative numerator */
  147. if (numerator <= 0.0)
  148. PG_RETURN_FLOAT8(0.0);
  149. PG_RETURN_FLOAT8(sqrt(numerator / (N * N)));
  150. }
  151. Datum
  152. float8_stddev_samp(PG_FUNCTION_ARGS)
  153. {
  154. ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0);
  155. float8 *transvalues;
  156. float8 N,
  157. sumX,
  158. sumX2,
  159. numerator;
  160. transvalues = check_float8_array(transarray, "float8_stddev_samp", 3);
  161. N = transvalues[0];
  162. sumX = transvalues[1];
  163. sumX2 = transvalues[2];
  164. /* Sample stddev is undefined when N is 0 or 1, so return NULL */
  165. if (N <= 1.0)
  166. PG_RETURN_NULL();
  167. numerator = N * sumX2 - sumX * sumX;
  168. CHECKFLOATVAL(numerator, isinf(sumX2) || isinf(sumX), true);
  169. /* Watch out for roundoff error producing a negative numerator */
  170. if (numerator <= 0.0)
  171. PG_RETURN_FLOAT8(0.0);
  172. PG_RETURN_FLOAT8(sqrt(numerator / (N * (N - 1.0))));
  173. }
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Monodyee/article/detail/492976
推荐阅读
相关标签
  

闽ICP备14008679号