赞
踩
Elasticsearch的聚合搜索用于对数据做一些复杂的分析统计,主要分为指标聚合、桶聚合、管道聚合、矩阵聚合。其中指标聚合、桶聚合最常使用。
本文测试数据采用官方测试数据库shakespeare(莎士比亚),可在Elasticsearch官网中下载到。此外本文内容均参考官方文档内容。
Max Aggregation用于查找最大值,例如查找shakespeare索引中line_id
最大的文档:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"max_line_id": {
"max": {
"field": "line_id"
}
}
}
}
max_line_id
为结果名,也可以为其它字符串,max_line_id
下面的键为聚合方式,其max
代表为Max Aggregation聚合,并需要指定field
为需要进行聚合的文档字段。
类似于MySQL中的select max(line_id) from shakespeare
。
查询结果为:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 10000, "relation" : "gte" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "max_line_id" : { "value" : 111396.0 } } }
其查询结果位于aggregations
中,即最大值为111396。
和Max Aggregation相反,Min Aggregation用于查找最小值,例如查找shakespeare索引中line_id
最小的文档:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"min_line_id": {
"min": {
"field": "line_id"
}
}
}
}
最后查询结果同样在aggregations
中。
Avg Aggregation用于计算平均数,例如计算shakespeare索引中line_id
字段的平均数:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"avg_line_id": {
"avg": {
"field": "line_id"
}
}
}
}
查询结果同样在aggregations
中。
Sum Aggregation用于计算总和,例如计算shakespeare索引中line_id
字段的平均数:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"sum_line_id": {
"sum": {
"field": "line_id"
}
}
}
}
Cardinality Aggregation用于基数统计,其作用是先执行类似SQL中的distinct
去重操作,然后统计其集合长度。例如下列查询中会统计出所有角色的数量:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"player_sum": {
"cardinality": {
"field": "play_name.keyword"
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"player_sum" : {
"value" : 36
}
}
}
表示有36个角色。
Stats Aggregation即基本统计,会返回count
、max
、min
、avg
、sum
统计数据,例如查询line_id
相关数据:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_stats": {
"stats": {
"field": "line_id"
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"line_id_stats" : {
"count" : 110486,
"min" : 4.0,
"max" : 111396.0,
"avg" : 55715.89386890647,
"sum" : 6.15582625E9
}
}
}
Extended Stats Aggregation比Stats Aggregation多了4个字段:平方和、方差、标准差、平均值加减两个标准差的区间,例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_stats": {
"extended_stats": {
"field": "line_id"
}
}
}
}
查询结果:
{ # 省略其它字段 "aggregations" : { "line_id_stats" : { "count" : 110486, "min" : 4.0, "max" : 111396.0, "avg" : 55715.89386890647, "sum" : 6.15582625E9, "sum_of_squares" : 4.57201930511864E14, "variance" : 1.0338374861198297E9, "std_deviation" : 32153.34331169668, "std_deviation_bounds" : { "upper" : 120022.58049229984, "lower" : -8590.792754486894 } } } }
Percentiles Aggregation用于百分位统计,具体操作是将某个字段的数据从大到小排序,并计算相应的累计百分位,某一百分位所对应的数据的值就是这一百分位的百分位数。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_percent": {
"percentiles": {
"field": "line_id",
"percents": [1, 5, 25, 50, 75, 95, 99]
}
}
}
}
查询结果:
{ # 省略其它字段 "aggregations" : { "line_id_percent" : { "values" : { "1.0" : 1115.3600000000001, "5.0" : 5575.834045307443, "25.0" : 27887.286615736997, "50.0" : 55711.257765161325, "75.0" : 83561.89545235902, "95.0" : 105830.47105865781, "99.0" : 110287.32171428572 } } } }
Value Count Aggregation可按字段统计文档数量,例如下面统计包含line_id
字段的文档数量:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_count": {
"value_count": {
"field": "line_id"
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"line_id_count" : {
"value" : 110486
}
}
}
桶聚合类似于SQL中的GROUP BY
,即遍历文档内容,根据的文档内容将其放到不同的桶中。
Terms Aggregation用于分组聚合,例如根据play_name
字段对不同的文档进行分组,然后统计每组文档的数量,相当于select count(*) from shakespeare group by play_name
。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"per_player": {
"terms": {
"field": "play_name.keyword",
"size": 10
}
}
}
}
field
相当于GROUP BY
后面指定的字段,size
字段表示仅查询出数量前10的桶。
查询结果:
{ # 省略其它字段 "aggregations" : { "per_player" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 72631, "buckets" : [ { "key" : "Hamlet", "doc_count" : 4219 }, { "key" : "Coriolanus", "doc_count" : 3958 }, { "key" : "Cymbeline", "doc_count" : 3927 }, { "key" : "Richard III", "doc_count" : 3911 }, { "key" : "Antony and Cleopatra", "doc_count" : 3815 }, { "key" : "Othello", "doc_count" : 3742 }, { "key" : "King Lear", "doc_count" : 3735 }, { "key" : "Troilus and Cressida", "doc_count" : 3682 }, { "key" : "A Winters Tale", "doc_count" : 3469 }, { "key" : "Henry VIII", "doc_count" : 3397 } ] } } }
Filter Aggregation为过滤器聚合搜索,可以把符合过滤器中条件的文档划分到不同的桶中。例如:
GET /shakespeare/_search { "size": 0, "aggs": { "per_player": { "filter": { "term": { "text_entry": "apple" } }, "aggs": { "player": { "terms": { "field": "play_name.keyword", "size": 10 } } } } } }
上述查询可以找出text_entry
包含单词apple
的文档,并按play_name
进行分组统计。
查询结果:
{ # 省略其它字段 "aggregations" : { "per_player" : { "doc_count" : 10, "player" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Taming of the Shrew", "doc_count" : 2 }, { "key" : "Twelfth Night", "doc_count" : 2 }, { "key" : "A Midsummer nights dream", "doc_count" : 1 }, { "key" : "Henry IV", "doc_count" : 1 }, { "key" : "King Lear", "doc_count" : 1 }, { "key" : "Loves Labours Lost", "doc_count" : 1 }, { "key" : "Merchant of Venice", "doc_count" : 1 }, { "key" : "The Tempest", "doc_count" : 1 } ] } } } }
Filters Aggregation相比Filter Aggregation,可以使用多个过滤器。例如:
GET /shakespeare/_search { "size": 0, "aggs": { "per_player": { "filters": { "filters": [ {"match": { "text_entry": "apple" } } ] }, "aggs": { "player": { "terms": { "field": "play_name.keyword", "size": 10 } } } } } }
filters
数组中可以定义多个过滤器。
Range Aggregation是范围聚合,用于反馈数据的分布情况,例如对line_id
按照0至10000,10000到50000,50000以上进行范围聚合,结果如下:
GET /shakespeare/_search { "size": 0, "aggs": { "id_range": { "range": { "field": "line_id", "ranges": [ { "from": 0, "to": 10000 }, { "from": 10000, "to": 50000}, { "from": 50000 } ] } } } }
查询结果:
{ # 省略其它字段 "aggregations" : { "id_range" : { "buckets" : [ { "key" : "0.0-10000.0", "from" : 0.0, "to" : 10000.0, "doc_count" : 9909 }, { "key" : "10000.0-50000.0", "from" : 10000.0, "to" : 50000.0, "doc_count" : 39664 }, { "key" : "50000.0-*", "from" : 50000.0, "doc_count" : 60913 } ] } } }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。