当前位置:   article > 正文

ElasticSearch--聚合查询_abstractaggregationbuilder

abstractaggregationbuilder

聚合(aggs)

聚合一般用于数据的统计分析,类似于mysql的group by。

聚合里面有两个基本概念,一个叫桶,一个叫度量。

桶的作用,是按照某种方式对数据进行分组,每一组数据成为一个桶。比如对手机品牌分组,可以得到小米桶,华为桶。

桶的分组方式

Date Histogram Aggregation:根据日期阶梯分组,例如给定阶梯为周,会自动每周分为一组
Histogram Aggregation:根据数值阶梯分组,与日期类似
Terms Aggregation:根据词条内容分组,词条内容完全匹配的为一组
Range Aggregation:数值和日期的范围分组,指定开始和结束,然后按段分组

可以看出ES的分组方式相当强大,mysql的group by只能实现类似Terms Aggregation的分组效果,而ES还可以根据阶梯和范围来分组。

度量

度量类似mysql的avg,max等函数,用来求分组内平均值,最大值等。

比较常用的一些度量聚合方式:

Avg Aggregation:求平均值
Max Aggregation:求最大值
Min Aggregation:求最小值
Percentiles Aggregation:求百分比
Stats Aggregation:同时返回avg、max、min、sum、count等
Sum Aggregation:求和
Top hits Aggregation:求前几
Value Count Aggregation:求总数

词条桶

我们来看最简单的词条桶,brand_aggs就是自定义桶的名字,terms表示词条桶,field:brand表示按照字段brand来划分桶,size为0表示不想返回查询结果,从这里可以看出分页不影响聚合的结果,也就是说可以实现分页查询和聚合结果一起返回。

下面的查询是通过品牌名来分组统计

  1. GET /goods/_search
  2. {
  3. "size" : 0,
  4. "aggs" : {
  5. "brand_aggs" : {
  6. "terms" : {
  7. "field" : "brand"
  8. }
  9. }
  10. }
  11. }

查询结果:

  1. {
  2. "took" : 3,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 3,
  6. "successful" : 3,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 5,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [ ]
  17. },
  18. "aggregations" : {
  19. "brand_aggs" : { //桶的名字
  20. "doc_count_error_upper_bound" : 0,
  21. "sum_other_doc_count" : 0,
  22. "buckets" : [ //查询结果
  23. {
  24. "key" : "华为", //品牌名,因为是按照品牌分组
  25. "doc_count" : 3 //统计的数量
  26. },
  27. {
  28. "key" : "小米",
  29. "doc_count" : 2
  30. }
  31. ]
  32. }
  33. }
  34. }

可以看到不需要加度量默认就把总数求出来了,如果要求品牌下平均手机价格,就需要加度量了

度量平均值

  1. GET /goods/_search
  2. {
  3. "size" : 0,
  4. "aggs" : {
  5. "brand_aggs" : {
  6. "terms" : {
  7. "field" : "brand"
  8. },
  9. "aggs":{
  10. "avg_price": {
  11. "avg": {
  12. "field": "price"
  13. }
  14. }
  15. }
  16. }
  17. }
  18. }

返回结果:

  1. {
  2. "took" : 1,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 3,
  6. "successful" : 3,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 5,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [ ]
  17. },
  18. "aggregations" : {
  19. "brand_aggs" : {
  20. "doc_count_error_upper_bound" : 0,
  21. "sum_other_doc_count" : 0,
  22. "buckets" : [
  23. {
  24. "key" : "华为",
  25. "doc_count" : 3,
  26. "avg_price" : {
  27. "value" : 4500.0
  28. }
  29. },
  30. {
  31. "key" : "小米",
  32. "doc_count" : 2,
  33. "avg_price" : {
  34. "value" : 5000.0
  35. }
  36. }
  37. ]
  38. }
  39. }
  40. }

代码实现

  1. public void testAggs() {
  2. AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.terms("brand_aggs").field("brand");//通过品牌分组
  3. aggregationBuilder.subAggregation(AggregationBuilders.avg("avg_price").field("price")); //平均值度量,计算price平均值
  4. NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
  5. .withPageable(PageRequest.of(0, 1)) //size只能大于0
  6. .addAggregation(aggregationBuilder)
  7. .build();
  8. SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
  9. Terms brandTerms = goodsInfos.getAggregations().get("brand_aggs");
  10. brandTerms.getBuckets().stream().forEach(bucket -> {
  11. System.out.println(bucket.getKey()); //获取品牌名
  12. System.out.println(bucket.getDocCount()); //获取总数
  13. ParsedAvg avgPrice = bucket.getAggregations().get("avg_price"); //获取平均价格
  14. System.out.println(avgPrice.getValue());
  15. });
  16. }

阶梯桶Histogram

下面的例子是按照500为一个阶梯统计不同价位手机数量

  1. GET /goods/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "price_histogram":{
  6. "histogram": {
  7. "field": "price",
  8. "interval": 500
  9. }
  10. }
  11. }
  12. }

结果:

  1. {
  2. "took" : 103,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 3,
  6. "successful" : 3,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 5,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [ ]
  17. },
  18. "aggregations" : {
  19. "price_histogram" : {
  20. "buckets" : [
  21. {
  22. "key" : 3500.0,
  23. "doc_count" : 1
  24. },
  25. {
  26. "key" : 4000.0,
  27. "doc_count" : 0
  28. },
  29. {
  30. "key" : 4500.0,
  31. "doc_count" : 2
  32. },
  33. {
  34. "key" : 5000.0,
  35. "doc_count" : 0
  36. },
  37. {
  38. "key" : 5500.0,
  39. "doc_count" : 2
  40. }
  41. ]
  42. }
  43. }
  44. }

代码:

  1. public void testHistogram() {
  2. AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.histogram("price_histogram").field("price").interval(500);//500一个阶梯统计
  3. NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
  4. .withPageable(PageRequest.of(0, 1)) //size只能大于0
  5. .addAggregation(aggregationBuilder)
  6. .build();
  7. SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
  8. ParsedHistogram priceHistogram = goodsInfos.getAggregations().get("price_histogram");
  9. priceHistogram.getBuckets().stream().forEach(bucket -> {
  10. System.out.println(bucket.getKey()); //阶梯值
  11. System.out.println(bucket.getDocCount()); //获取总数
  12. });
  13. }

范围分桶Range Aggregation

统计价格在4000-6000手机的数量

  1. GET /goods/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "price_range": {
  6. "range": {
  7. "field": "price",
  8. "ranges": [
  9. {
  10. "from": 4000,
  11. "to": 6000
  12. }
  13. ]
  14. }
  15. }
  16. }
  17. }

结果:

  1. {
  2. "took" : 1,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 3,
  6. "successful" : 3,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 5,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [ ]
  17. },
  18. "aggregations" : {
  19. "price_range" : {
  20. "buckets" : [
  21. {
  22. "key" : "4000.0-6000.0",
  23. "from" : 4000.0,
  24. "to" : 6000.0,
  25. "doc_count" : 4
  26. }
  27. ]
  28. }
  29. }
  30. }

代码:

  1. public void testRangeAggrs() {
  2. AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.range("price_range").field("price").addRange(4000, 6000);//[4000,6000)范围统计
  3. NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
  4. .withPageable(PageRequest.of(0, 1)) //size只能大于0
  5. .addAggregation(aggregationBuilder)
  6. .build();
  7. SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
  8. ParsedRange priceHistogram = goodsInfos.getAggregations().get("price_range");
  9. priceHistogram.getBuckets().stream().forEach(bucket -> {
  10. System.out.println(bucket.getKey()); //key值
  11. System.out.println(bucket.getDocCount()); //获取总数
  12. });
  13. }

日期桶DateHistogram

  1. GET /cars/_search
  2. {
  3. "size":0,
  4. "aggs" : {
  5. "date" : {
  6. "date_histogram" : {
  7. "field" : "sold",
  8. "interval" : "1M",
  9. "format" : "yyyy-MM",
  10. "time_zone": "+08:00",
  11. "min_doc_count": 1
  12. }
  13. }
  14. }
  15. }

结果:

  1. "aggregations" : {
  2. "date" : {
  3. "buckets" : [
  4. {
  5. "key_as_string" : "2013-12",
  6. "key" : 1385859600000,
  7. "doc_count" : 1
  8. },
  9. {
  10. "key_as_string" : "2014-02",
  11. "key" : 1391216400000,
  12. "doc_count" : 1
  13. },
  14. {
  15. "key_as_string" : "2014-05",
  16. "key" : 1398906000000,
  17. "doc_count" : 1
  18. },
  19. {
  20. "key_as_string" : "2014-07",
  21. "key" : 1404176400000,
  22. "doc_count" : 1
  23. },
  24. {
  25. "key_as_string" : "2014-08",
  26. "key" : 1406854800000,
  27. "doc_count" : 1
  28. },
  29. {
  30. "key_as_string" : "2014-10",
  31. "key" : 1412125200000,
  32. "doc_count" : 1
  33. },
  34. {
  35. "key_as_string" : "2014-11",
  36. "key" : 1414803600000,
  37. "doc_count" : 2
  38. }
  39. ]
  40. }
  41. }

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/795271
推荐阅读
相关标签
  

闽ICP备14008679号