赞
踩
度量聚集和桶聚集一般用于文档中的数值型字段,而本文讨论的管道聚集针对其他聚集产生的输出值,因此管道聚集是针对中间值而不是原始文档数据。对于计算复杂统计和数学度量,如累加和、导数(变化情况)、移动平均等非常有用。
本文讨论管道聚集的两个基本类型,通过示例展示常用的管道聚集,如求和、累加求和、最小值、最大值、平均值以及导数等管道聚集。
管道聚集通常分为两类:父、兄弟管道聚集。
父管道聚集使用其父聚集的输出,它获取此聚合的值计算新的分组或聚集并将它们添加到已经存在的分组中。导数聚集、累加聚集是两个常用的父管道聚集示例。
与父管道聚集相比,兄弟聚集使用兄弟聚集的输出。它获取该输出并计算一个新聚合,该聚合与兄弟聚合处于同一级别。
管道聚集需要访问父聚集或兄弟聚集的路径。这可以使用buckets_path
参数引用需要使用的聚集,表示需要度量的路径。该参数有一定的语法规范:
AGG_SEPARATOR = '>' ;
METRIC_SEPARATOR = '.' ;
AGG_NAME = <the name of the aggregation> ;
METRIC = <the name of the metric (in case of multi-value metrics aggregation)> ;
PATH = <AGG_NAME> [ <AGG_SEPARATOR>, <AGG_NAME> ]* [ <METRIC_SEPARATOR>, <METRIC> ] ;
举例,my_bucket>my_stats.sum
中的sum
值在my_stats
度量中,其包括在my_bucket
分组聚集内。
需要强调的是路径时相对于管道聚集的位置,因此路径不能回溯至上级聚集树。举例,导数管道聚集嵌入在date_histogram
中,引用兄弟度量the_sum
:
{ "aggs": { "total_monthly_visits":{ "date_histogram":{ "field":"date", "interval":"month" }, "aggs":{ "the_sum":{ "sum":{ "field": "visits" } }, "the_derivative":{ "derivative":{ "buckets_path": "the_sum" } } } } } }
兄弟管道聚集可以放在连续分组后面,而不是嵌入在它们里面。在这种情况下,访问必要的度量,需要指定完整路径包括父聚集的路径:
{ "aggs": { "visits_per_month": { "date_histogram": { "field": "date", "interval": "month" }, "aggs": { "total_visits": { "sum": { "field": "visits" } } } }, "avg_monthly_visits": { "avg_bucket": { "buckets_path": "visits_per_month>total_visits" } } } }
上面示例中,我们通过父日期直方图visits_per_month
聚集引用兄弟聚集total_visits
。其完整路径为visits_per_month>total_visits
。
需要记住的重要内容是,管道聚集不能有子聚集。但像导数管道聚集,能在它们的buckets_path
引用其他管道聚集,这样可以链接多个管道聚集。举例,我们可以链接两个一级导数计算二级导数(导数的导数,变化率的变化率)。
我们知道,度量聚集和分组聚集处理缺失数据使用missing
。管道聚集使用gap_policy
参数处理文档不包含需要的字段或没有文档符合匹配查询形成一个或多个分组等。该参数支持下面缺失策略:
如果分组不存在时处理缺失数据。如果启用该策略,聚集会跳过空的分组并继续使用下一个有效值计算。
使用0代替所有缺失值,管道聚集正常处理不受影响。
测试环境:elasticsearch7.x kibana7.x
创建下面索引,映射包括三个字段:date, visits, max_time_spent
:
PUT /traffic_stats { "mappings": { "properties": { "date": { "type": "date", "format": "dateOptionalTime" }, "visits": { "type": "integer" }, "max_time_spent": { "type": "integer" } } } }
插入测试数据:
POST /traffic_stats/_bulk {"index":{}} {"visits":"488", "date":"2018-10-1", "max_time_spent":"900"} {"index":{}} {"visits":"783", "date":"2018-10-6", "max_time_spent":"928"} {"index":{}} {"visits":"789", "date":"2018-10-12", "max_time_spent":"1834"} {"index":{}} {"visits":"1299", "date":"2018-11-3", "max_time_spent":"592"} {"index":{}} {"visits":"394", "date":"2018-11-6", "max_time_spent":"1249"} {"index":{}} {"visits":"448", "date":"2018-11-24", "max_time_spent":"874"} {"index":{}} {"visits":"768", "date":"2018-12-18", "max_time_spent":"876"} {"index":{}} {"visits":"1194", "date":"2018-12-24", "max_time_spent":"1249"} {"index":{}} {"visits":"987", "date":"2018-12-28", "max_time_spent":"1599"} {"index":{}} {"visits":"872", "date":"2019-01-1", "max_time_spent":"828"} {"index":{}} {"visits":"972", "date":"2019-01-5", "max_time_spent":"723"} {"index":{}} {"visits":"827", "date":"2019-02-5", "max_time_spent":"1300"} {"index":{}} {"visits":"1584", "date":"2019-02-15", "max_time_spent":"1500"} {"index":{}} {"visits":"1604", "date":"2019-03-2", "max_time_spent":"1488"} {"index":{}} {"visits":"1499", "date":"2019-03-27", "max_time_spent":"1399"} {"index":{}} {"visits":"1392", "date":"2019-04-8", "max_time_spent":"1294"} {"index":{}} {"visits":"1247", "date":"2019-04-15", "max_time_spent":"1194"} {"index":{}} {"visits":"984", "date":"2019-05-15", "max_time_spent":"1184"} {"index":{}} {"visits":"1228", "date":"2019-05-18", "max_time_spent":"1485"} {"index":{}} {"visits":"1423", "date":"2019-06-14", "max_time_spent":"1452"} {"index":{}} {"visits":"1238", "date":"2019-06-24", "max_time_spent":"1329"} {"index":{}} {"visits":"1388", "date":"2019-07-14", "max_time_spent":"1542"} {"index":{}} {"visits":"1499", "date":"2019-07-24", "max_time_spent":"1742"} {"index":{}} {"visits":"1523", "date":"2019-08-13", "max_time_spent":"1552"} {"index":{}} {"visits":"1443", "date":"2019-08-19", "max_time_spent":"1511"} {"index":{}} {"visits":"1587", "date":"2019-09-14", "max_time_spent":"1497"} {"index":{}} {"visits":"1534", "date":"2019-09-27", "max_time_spent":"1434"}
Ok,环境和数据都准备好了,首先从平均分组管道聚集开始。
平均分组管道聚集是典型的兄弟管道聚集。一般用于数值计算,通过其他兄弟聚集计算所有分组的平均值。对兄弟聚集有两个需求,兄弟聚集必须是多个分组聚集,必须指定的度量是数值。
为了理解管道聚集如何工作,可以把整个计算过程分为几个阶段。请看下面的查询,其包括三个阶段。第一,elasticsearch创建一个日期直方图,使用月作为日期间隔对索引中的visits
字段进行分组。日期直方图产生多个分组,每个分组包括多个文档。接下来求和子聚集计算组内每月所有visits
字段的和。最后,平均分组管道聚集引用所有兄弟聚集的和,计算所有分组的平均值。因此我们将得到每个月的平均博客访问量。
GET /traffic_stats/_search?size=0 { "aggs": { "visits_per_month": { "date_histogram": { "field": "date", "interval": "month" }, "aggs": { "total_visits": { "sum": { "field": "visits" } } } }, "avg_monthly_visits": { "avg_bucket": { "buckets_path": "visits_per_month>total_visits" } } } }
响应结果:
{ "took" : 1184, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 27, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "visits_per_month" : { "buckets" : [ { "key_as_string" : "2018-10-01T00:00:00.000Z", "key" : 1538352000000, "doc_count" : 3, "total_visits" : { "value" : 2060.0 } }, { "key_as_string" : "2018-11-01T00:00:00.000Z", "key" : 1541030400000, "doc_count" : 3, "total_visits" : { "value" : 2141.0 } }, { "key_as_string" : "2018-12-01T00:00:00.000Z", "key" : 1543622400000, "doc_count" : 3, "total_visits" : { "value" : 2949.0 } }, { "key_as_string" : "2019-01-01T00:00:00.000Z", "key" : 1546300800000, "doc_count" : 2, "total_visits" : { "value" : 1844.0 } }, { "key_as_string" : "2019-02-01T00:00:00.000Z", "key" : 1548979200000, "doc_count" : 2, "total_visits" : { "value" : 2411.0 } }, { "key_as_string" : "2019-03-01T00:00:00.000Z", "key" : 1551398400000, "doc_count" : 2, "total_visits" : { "value" : 3103.0 } }, { "key_as_string" : "2019-04-01T00:00:00.000Z", "key" : 1554076800000, "doc_count" : 2, "total_visits" : { "value" : 2639.0 } }, { "key_as_string" : "2019-05-01T00:00:00.000Z", "key" : 1556668800000, "doc_count" : 2, "total_visits" : { "value" : 2212.0 } }, { "key_as_string" : "2019-06-01T00:00:00.000Z", "key" : 1559347200000, "doc_count" : 2, "total_visits" : { "value" : 2661.0 } }, { "key_as_string" : "2019-07-01T00:00:00.000Z", "key" : 1561939200000, "doc_count" : 2, "total_visits" : { "value" : 2887.0 } }, { "key_as_string" : "2019-08-01T00:00:00.000Z", "key" : 1564617600000, "doc_count" : 2, "total_visits" : { "value" : 2966.0 } }, { "key_as_string" : "2019-09-01T00:00:00.000Z", "key" : 1567296000000, "doc_count" : 2, "total_visits" : { "value" : 3121.0 } } ] }, "avg_monthly_visits" : { "value" : 2582.8333333333335 } } }
月度博客平均访问量为2582.83,仔细看看上面描述的步骤,应该能理解管道聚集的计算流程。它们利用分组聚集或度量聚集的中间结果,增加额外的计算结果。
这是一个父管道聚集,用于计算父直方图或日期直方图特定度量的导数。有两个必要条件:
min_doc_count
必须设置为0
(这是直方图聚集的缺省值)。如果min_doc_count
大于0,一些分组将被忽略,会导致错误或令人困惑的导数值。从数学角度看,函数的导数测量函数值(输出值)相对于其参数(输入值)的变化的敏感性。也就是说,导数根据变量计算函数的变化速度。对我们的数据来说,导数聚集用来计算相对于前一个周期的变量速度。下面通过示例进行说明,首先计算一阶导数,一阶导数告诉我们函数是否增长或下降,增长或下降的幅度。示例代码:
GET /traffic_stats/_search?size=0 { "aggs" : { "visits_per_month" : { "date_histogram" : { "field" : "date", "interval" : "month" }, "aggs": { "total_visits": { "sum": { "field": "visits" } }, "visits_deriv": { "derivative": { "buckets_path": "total_visits" } } } } } }
buckets_path
指明导数聚集使用total_visits
父聚集的输出。因为导数聚集是父管道聚集,因此我们需使用父聚集。响应结果如下:
{ "took" : 61, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 27, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "visits_per_month" : { "buckets" : [ { "key_as_string" : "2018-10-01T00:00:00.000Z", "key" : 1538352000000, "doc_count" : 3, "total_visits" : { "value" : 2060.0 } }, { "key_as_string" : "2018-11-01T00:00:00.000Z", "key" : 1541030400000, "doc_count" : 3, "total_visits" : { "value" : 2141.0 }, "visits_deriv" : { "value" : 81.0 } }, { "key_as_string" : "2018-12-01T00:00:00.000Z", "key" : 1543622400000, "doc_count" : 3, "total_visits" : { "value" : 2949.0 }, "visits_deriv" : { "value" : 808.0 } }, { "key_as_string" : "2019-01-01T00:00:00.000Z", "key" : 1546300800000, "doc_count" : 2, "total_visits" : { "value" : 1844.0 }, "visits_deriv" : { "value" : -1105.0 } }, { "key_as_string" : "2019-02-01T00:00:00.000Z", "key" : 1548979200000, "doc_count" : 2, "total_visits" : { "value" : 2411.0 }, "visits_deriv" : { "value" : 567.0 } }, { "key_as_string" : "2019-03-01T00:00:00.000Z", "key" : 1551398400000, "doc_count" : 2, "total_visits" : { "value" : 3103.0 }, "visits_deriv" : { "value" : 692.0 } }, { "key_as_string" : "2019-04-01T00:00:00.000Z", "key" : 1554076800000, "doc_count" : 2, "total_visits" : { "value" : 2639.0 }, "visits_deriv" : { "value" : -464.0 } }, { "key_as_string" : "2019-05-01T00:00:00.000Z", "key" : 1556668800000, "doc_count" : 2, "total_visits" : { "value" : 2212.0 }, "visits_deriv" : { "value" : -427.0 } }, { "key_as_string" : "2019-06-01T00:00:00.000Z", "key" : 1559347200000, "doc_count" : 2, "total_visits" : { "value" : 2661.0 }, "visits_deriv" : { "value" : 449.0 } }, { "key_as_string" : "2019-07-01T00:00:00.000Z", "key" : 1561939200000, "doc_count" : 2, "total_visits" : { "value" : 2887.0 }, "visits_deriv" : { "value" : 226.0 } }, { "key_as_string" : "2019-08-01T00:00:00.000Z", "key" : 1564617600000, "doc_count" : 2, "total_visits" : { "value" : 2966.0 }, "visits_deriv" : { "value" : 79.0 } }, { "key_as_string" : "2019-09-01T00:00:00.000Z", "key" : 1567296000000, "doc_count" : 2, "total_visits" : { "value" : 3121.0 }, "visits_deriv" : { "value" : 155.0 } } ] } } }
如果你比较两个相邻的分组,当前分组和前一个分组值的差即为当前导数值。举例:
{ "key_as_string" : "2018-11-01T00:00:00.000Z", "key" : 1541030400000, "doc_count" : 3, "total_visits" : { "value" : 2141.0 }, "visits_deriv" : { "value" : 81.0 } }, { "key_as_string" : "2018-12-01T00:00:00.000Z", "key" : 1543622400000, "doc_count" : 3, "total_visits" : { "value" : 2949.0 }, "visits_deriv" : { "value" : 808.0 } }
12月数据是2949,,11月是2141,因此12月的导数值为808,即两者的差。
二阶导数是双导数或导数的导数。它衡量一个量的变化率本身是如何变化的。在elasticsearch中,我们可以通过链接导数管道聚集至另一个导数管道聚集中来计算二阶导数。这种方式首先计算一阶导数,然后基于一阶导数计算二阶导数。下面看示例:
GET /traffic_stats/_search?size=0 { "aggs" : { "visits_per_month" : { "date_histogram" : { "field" : "date", "interval" : "month" }, "aggs": { "total_visits": { "sum": { "field": "visits" } }, "visits_deriv": { "derivative": { "buckets_path": "total_visits" } }, "visits_2nd_deriv": { "derivative": { "buckets_path": "visits_deriv" } } } } } }
我们看到一阶导数使用路径total_visits
指明依赖求和聚集来计算。而二阶导数使用路径visits_deriv
,即指定一阶导数。通过这种方式,二阶导数计算可视为双管道聚集。响应结果:
{ "took" : 6, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 27, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "visits_per_month" : { "buckets" : [ { "key_as_string" : "2018-10-01T00:00:00.000Z", "key" : 1538352000000, "doc_count" : 3, "total_visits" : { "value" : 2060.0 } }, { "key_as_string" : "2018-11-01T00:00:00.000Z", "key" : 1541030400000, "doc_count" : 3, "total_visits" : { "value" : 2141.0 }, "visits_deriv" : { "value" : 81.0 } }, { "key_as_string" : "2018-12-01T00:00:00.000Z", "key" : 1543622400000, "doc_count" : 3, "total_visits" : { "value" : 2949.0 }, "visits_deriv" : { "value" : 808.0 }, "visits_2nd_deriv" : { "value" : 727.0 } }, { "key_as_string" : "2019-01-01T00:00:00.000Z", "key" : 1546300800000, "doc_count" : 2, "total_visits" : { "value" : 1844.0 }, "visits_deriv" : { "value" : -1105.0 }, "visits_2nd_deriv" : { "value" : -1913.0 } }, { "key_as_string" : "2019-02-01T00:00:00.000Z", "key" : 1548979200000, "doc_count" : 2, "total_visits" : { "value" : 2411.0 }, "visits_deriv" : { "value" : 567.0 }, "visits_2nd_deriv" : { "value" : 1672.0 } }, { "key_as_string" : "2019-03-01T00:00:00.000Z", "key" : 1551398400000, "doc_count" : 2, "total_visits" : { "value" : 3103.0 }, "visits_deriv" : { "value" : 692.0 }, "visits_2nd_deriv" : { "value" : 125.0 } }, { "key_as_string" : "2019-04-01T00:00:00.000Z", "key" : 1554076800000, "doc_count" : 2, "total_visits" : { "value" : 2639.0 }, "visits_deriv" : { "value" : -464.0 }, "visits_2nd_deriv" : { "value" : -1156.0 } }, { "key_as_string" : "2019-05-01T00:00:00.000Z", "key" : 1556668800000, "doc_count" : 2, "total_visits" : { "value" : 2212.0 }, "visits_deriv" : { "value" : -427.0 }, "visits_2nd_deriv" : { "value" : 37.0 } }, { "key_as_string" : "2019-06-01T00:00:00.000Z", "key" : 1559347200000, "doc_count" : 2, "total_visits" : { "value" : 2661.0 }, "visits_deriv" : { "value" : 449.0 }, "visits_2nd_deriv" : { "value" : 876.0 } }, { "key_as_string" : "2019-07-01T00:00:00.000Z", "key" : 1561939200000, "doc_count" : 2, "total_visits" : { "value" : 2887.0 }, "visits_deriv" : { "value" : 226.0 }, "visits_2nd_deriv" : { "value" : -223.0 } }, { "key_as_string" : "2019-08-01T00:00:00.000Z", "key" : 1564617600000, "doc_count" : 2, "total_visits" : { "value" : 2966.0 }, "visits_deriv" : { "value" : 79.0 }, "visits_2nd_deriv" : { "value" : -147.0 } }, { "key_as_string" : "2019-09-01T00:00:00.000Z", "key" : 1567296000000, "doc_count" : 2, "total_visits" : { "value" : 3121.0 }, "visits_deriv" : { "value" : 155.0 }, "visits_2nd_deriv" : { "value" : 76.0 } } ] } } }
看看两条邻近记录进行对比:
{ "key_as_string" : "2019-08-01T00:00:00.000Z", "key" : 1564617600000, "doc_count" : 2, "total_visits" : { "value" : 2966.0 }, "visits_deriv" : { "value" : 79.0 }, "visits_2nd_deriv" : { "value" : -147.0 } }, { "key_as_string" : "2019-09-01T00:00:00.000Z", "key" : 1567296000000, "doc_count" : 2, "total_visits" : { "value" : 3121.0 }, "visits_deriv" : { "value" : 155.0 }, "visits_2nd_deriv" : { "value" : 76.0 } }
我们看到8、9月份的一阶导数分别为79,155,则9月份二阶导数为两者之差76.
假设我们可以设计三个链式流水线聚合来计算第三阶、第四阶甚至更高阶的导数。然而,这对大多数数据来说几乎没有价值。前两个部分没有二阶导数因为我们需要从一阶导数中得到至少两个数据点来计算二阶导数。
最大分组聚集是兄弟管道聚集,其搜索兄弟聚集中带最大度量值的分组并输出对应值和分组的key。度量必须是数值类型,兄弟度量必须是多分组聚集。
下面示例中,最大分组聚集计算有日期直方图聚集生成的所有月份中最大数值。它使用求和聚集total_visits
的结果,即兄弟聚集。
GET /traffic_stats/_search?size=0 { "aggs": { "visits_per_month": { "date_histogram": { "field": "date", "interval": "month" }, "aggs": { "total_visits": { "sum": { "field": "visits" } } } }, "max_monthly_visits": { "max_bucket": { "buckets_path": "visits_per_month>total_visits" } } } }
响应结果为:
{ "took" : 8, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 27, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "visits_per_month" : { "buckets" : [ { "key_as_string" : "2018-10-01T00:00:00.000Z", "key" : 1538352000000, "doc_count" : 3, "total_visits" : { "value" : 2060.0 } }, { "key_as_string" : "2018-11-01T00:00:00.000Z", "key" : 1541030400000, "doc_count" : 3, "total_visits" : { "value" : 2141.0 } }, { "key_as_string" : "2018-12-01T00:00:00.000Z", "key" : 1543622400000, "doc_count" : 3, "total_visits" : { "value" : 2949.0 } }, { "key_as_string" : "2019-01-01T00:00:00.000Z", "key" : 1546300800000, "doc_count" : 2, "total_visits" : { "value" : 1844.0 } }, { "key_as_string" : "2019-02-01T00:00:00.000Z", "key" : 1548979200000, "doc_count" : 2, "total_visits" : { "value" : 2411.0 } }, { "key_as_string" : "2019-03-01T00:00:00.000Z", "key" : 1551398400000, "doc_count" : 2, "total_visits" : { "value" : 3103.0 } }, { "key_as_string" : "2019-04-01T00:00:00.000Z", "key" : 1554076800000, "doc_count" : 2, "total_visits" : { "value" : 2639.0 } }, { "key_as_string" : "2019-05-01T00:00:00.000Z", "key" : 1556668800000, "doc_count" : 2, "total_visits" : { "value" : 2212.0 } }, { "key_as_string" : "2019-06-01T00:00:00.000Z", "key" : 1559347200000, "doc_count" : 2, "total_visits" : { "value" : 2661.0 } }, { "key_as_string" : "2019-07-01T00:00:00.000Z", "key" : 1561939200000, "doc_count" : 2, "total_visits" : { "value" : 2887.0 } }, { "key_as_string" : "2019-08-01T00:00:00.000Z", "key" : 1564617600000, "doc_count" : 2, "total_visits" : { "value" : 2966.0 } }, { "key_as_string" : "2019-09-01T00:00:00.000Z", "key" : 1567296000000, "doc_count" : 2, "total_visits" : { "value" : 3121.0 } } ] }, "max_monthly_visits" : { "value" : 3121.0, "keys" : [ "2019-09-01T00:00:00.000Z" ] } } }
我们看到求和聚集计算每个月分组的访问量之和,然后最大分组管道聚集计算最大访问量的分组,结果为3121,属于2019-09-01月份对于的分组。
最小分组聚集逻辑一样。我们仅需要修改查询中的max_bucket
为min_bucket
。
"max_monthly_visits": { "min_bucket": { "buckets_path": "visits_per_month>total_visits" } }
结果为:
"min_monthly_visits" : {
"value" : 1844.0,
"keys" : [
"2019-01-01T00:00:00.000Z"
]
}
有时需要计算有其他聚集生成的所有分组值的和。这时可以使用求和分组管道聚集,属于兄弟聚集。下面计算所有月度访问量的和:
GET /traffic_stats/_search?size=0 { "aggs": { "visits_per_month": { "date_histogram": { "field": "date", "interval": "month" }, "aggs": { "total_visits": { "sum": { "field": "visits" } } } }, "sum_monthly_visits": { "sum_bucket": { "buckets_path": "visits_per_month>total_visits" } } } }
管道聚集使用兄弟聚集total_visits
,其表示每月的访问量。响应结果为:
{ "took" : 6, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 27, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "visits_per_month" : { "buckets" : [ { "key_as_string" : "2018-10-01T00:00:00.000Z", "key" : 1538352000000, "doc_count" : 3, "total_visits" : { "value" : 2060.0 } }, { "key_as_string" : "2018-11-01T00:00:00.000Z", "key" : 1541030400000, "doc_count" : 3, "total_visits" : { "value" : 2141.0 } }, { "key_as_string" : "2018-12-01T00:00:00.000Z", "key" : 1543622400000, "doc_count" : 3, "total_visits" : { "value" : 2949.0 } }, { "key_as_string" : "2019-01-01T00:00:00.000Z", "key" : 1546300800000, "doc_count" : 2, "total_visits" : { "value" : 1844.0 } }, { "key_as_string" : "2019-02-01T00:00:00.000Z", "key" : 1548979200000, "doc_count" : 2, "total_visits" : { "value" : 2411.0 } }, { "key_as_string" : "2019-03-01T00:00:00.000Z", "key" : 1551398400000, "doc_count" : 2, "total_visits" : { "value" : 3103.0 } }, { "key_as_string" : "2019-04-01T00:00:00.000Z", "key" : 1554076800000, "doc_count" : 2, "total_visits" : { "value" : 2639.0 } }, { "key_as_string" : "2019-05-01T00:00:00.000Z", "key" : 1556668800000, "doc_count" : 2, "total_visits" : { "value" : 2212.0 } }, { "key_as_string" : "2019-06-01T00:00:00.000Z", "key" : 1559347200000, "doc_count" : 2, "total_visits" : { "value" : 2661.0 } }, { "key_as_string" : "2019-07-01T00:00:00.000Z", "key" : 1561939200000, "doc_count" : 2, "total_visits" : { "value" : 2887.0 } }, { "key_as_string" : "2019-08-01T00:00:00.000Z", "key" : 1564617600000, "doc_count" : 2, "total_visits" : { "value" : 2966.0 } }, { "key_as_string" : "2019-09-01T00:00:00.000Z", "key" : 1567296000000, "doc_count" : 2, "total_visits" : { "value" : 3121.0 } } ] }, "sum_monthly_visits" : { "value" : 30994.0 } } }
求和管道聚集简单计算所有月份访问量之和,即计算兄弟求和聚集产生的中间结果之和。
累加求和聚集利用不同的方法。通常情况下,累加求和是给定序列的部分值累加序列。举例,{a,b,c,…}序列的累加和为a,a+b,a+b+c,…
累加和聚集是父管道聚集,用于计算父直方图(或日期直方图)聚集中指定的度量值的累加和。与其他父管道聚集一样,特定的度量值必须是数值型,直方图的内部参数min_doc_count
设为0(缺省值)。
GET /traffic_stats/_search?size=0 { "aggs" : { "visits_per_month" : { "date_histogram" : { "field" : "date", "interval" : "month" }, "aggs": { "total_visits": { "sum": { "field": "visits" } }, "cumulative_visits": { "cumulative_sum": { "buckets_path": "total_visits" } } } } } }
响应结果为:
{ "took" : 8, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 27, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "visits_per_month" : { "buckets" : [ { "key_as_string" : "2018-10-01T00:00:00.000Z", "key" : 1538352000000, "doc_count" : 3, "total_visits" : { "value" : 2060.0 }, "cumulative_visits" : { "value" : 2060.0 } }, { "key_as_string" : "2018-11-01T00:00:00.000Z", "key" : 1541030400000, "doc_count" : 3, "total_visits" : { "value" : 2141.0 }, "cumulative_visits" : { "value" : 4201.0 } }, { "key_as_string" : "2018-12-01T00:00:00.000Z", "key" : 1543622400000, "doc_count" : 3, "total_visits" : { "value" : 2949.0 }, "cumulative_visits" : { "value" : 7150.0 } }, { "key_as_string" : "2019-01-01T00:00:00.000Z", "key" : 1546300800000, "doc_count" : 2, "total_visits" : { "value" : 1844.0 }, "cumulative_visits" : { "value" : 8994.0 } }, { "key_as_string" : "2019-02-01T00:00:00.000Z", "key" : 1548979200000, "doc_count" : 2, "total_visits" : { "value" : 2411.0 }, "cumulative_visits" : { "value" : 11405.0 } }, { "key_as_string" : "2019-03-01T00:00:00.000Z", "key" : 1551398400000, "doc_count" : 2, "total_visits" : { "value" : 3103.0 }, "cumulative_visits" : { "value" : 14508.0 } }, { "key_as_string" : "2019-04-01T00:00:00.000Z", "key" : 1554076800000, "doc_count" : 2, "total_visits" : { "value" : 2639.0 }, "cumulative_visits" : { "value" : 17147.0 } }, { "key_as_string" : "2019-05-01T00:00:00.000Z", "key" : 1556668800000, "doc_count" : 2, "total_visits" : { "value" : 2212.0 }, "cumulative_visits" : { "value" : 19359.0 } }, { "key_as_string" : "2019-06-01T00:00:00.000Z", "key" : 1559347200000, "doc_count" : 2, "total_visits" : { "value" : 2661.0 }, "cumulative_visits" : { "value" : 22020.0 } }, { "key_as_string" : "2019-07-01T00:00:00.000Z", "key" : 1561939200000, "doc_count" : 2, "total_visits" : { "value" : 2887.0 }, "cumulative_visits" : { "value" : 24907.0 } }, { "key_as_string" : "2019-08-01T00:00:00.000Z", "key" : 1564617600000, "doc_count" : 2, "total_visits" : { "value" : 2966.0 }, "cumulative_visits" : { "value" : 27873.0 } }, { "key_as_string" : "2019-09-01T00:00:00.000Z", "key" : 1567296000000, "doc_count" : 2, "total_visits" : { "value" : 3121.0 }, "cumulative_visits" : { "value" : 30994.0 } } ] } } }
聚集首先计算两个分组的和,然后将结果与下一个分组的值相加,以此类推。通过这种方式,它将序列中所有分组的和累加起来。
管道聚集用于实现涉及有其他聚集产生中间结果的复杂计算。可以提取如导数、二阶导数、移动平均等其他类型度量计算,往往并不直接针对文档数据,而是涉及多个中间步骤进行计算。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。