当前位置:   article > 正文

ElasticSearch--去重查询/根据字段去重--方法/实例_es 对text去重

es 对text去重

原文网址:ElasticSearch--去重查询/根据字段去重--方法/实例_IT利刃出鞘的博客-CSDN博客

简介

本文介绍如何根据某一个字段进行去重。包括:获取去重后的结果,统计去重后的数量。

SQL中,我们可以用dinstinct语句进行去重,例如:     

  • 获取去重后的结果:SELECT DISTINCT name, sex FROM person;
  • 统计去重后的数量:SELECT COUNT(DISTINCT name, sex) FROM person;

Elasticsearch也可以做到获取去重后的结果,统计去重后的数量,例如:

  • 获取去重后的结果
    • 方案1:collapse折叠功能(ES5.3之后支持)
      • 推荐。原因:性能高,占内存小
    • 方案2:字段聚合+top_hits聚合
      • 不推荐。原因:性能差,占内存大
  • 统计去重后的数量
    • 聚合+cardinality聚合函数

索引结构及数据

索引结构

http://localhost:9200/
PUT blog

  1. {
  2. "mappings": {
  3. "properties": {
  4. "id":{
  5. "type":"long"
  6. },
  7. "title": {
  8. "type": "text"
  9. },
  10. "content": {
  11. "type": "text"
  12. },
  13. "author":{
  14. "type": "text",
  15. "fields": {
  16. "keyword": {
  17. "type": "keyword"
  18. }
  19. }
  20. },
  21. "category":{
  22. "type": "keyword"
  23. },
  24. "createTime": {
  25. "type": "date",
  26. "format":"yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd'T'HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss||epoch_millis"
  27. },
  28. "updateTime": {
  29. "type": "date",
  30. "format":"yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd'T'HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss||epoch_millis"
  31. },
  32. "status":{
  33. "type":"integer"
  34. },
  35. "serialNum": {
  36. "type": "keyword"
  37. }
  38. }
  39. }
  40. }

数据

  • 每个文档必须独占一行,不能换行。
  • 此命令要放到postman中去执行,如果用head执行会失败

http://localhost:9200/
POST _bulk

  1. {"index":{"_index":"blog","_id":1}}
  2. {"blogId":1,"title":"Spring Data ElasticSearch学习教程1","content":"这是批量添加的文档1","author":"Tony","category":"ElasticSearch","status":1,"serialNum":"1","createTime":"2021-10-10 11:52:01.249","updateTime":null}
  3. {"index":{"_index":"blog","_id":2}}
  4. {"blogId":2,"title":"Spring Data ElasticSearch学习教程2","content":"这是批量添加的文档2","author":"Tony","category":"ElasticSearch","status":1,"serialNum":"2","createTime":"2021-10-10 11:52:02.249","updateTime":null}
  5. {"index":{"_index":"blog","_id":3}}
  6. {"blogId":3,"title":"Spring Data ElasticSearch学习教程3","content":"这是批量添加的文档3","author":"Tony","category":"ElasticSearch","status":1,"serialNum":"3","createTime":"2021-10-10 11:52:03.249","updateTime":null}
  7. {"index":{"_index":"blog","_id":4}}
  8. {"blogId":4,"title":"Spring Data ElasticSearch学习教程4","content":"这是批量添加的文档4","author":"Tony","category":"ElasticSearch","status":1,"serialNum":"4","createTime":"2021-10-10 11:52:04.249","updateTime":null}
  9. {"index":{"_index":"blog","_id":5}}
  10. {"blogId":5,"title":"Spring Data ElasticSearch学习教程5","content":"这是批量添加的文档5","author":"Tony","category":"ElasticSearch","status":1,"serialNum":"5","createTime":"2021-10-10 11:52:05.249","updateTime":null}
  11. {"index":{"_index":"blog","_id":6}}
  12. {"blogId":6,"title":"Java学习教程6","content":"这是批量添加的文档6","author":"Tony","category":"ElasticSearch","status":1,"serialNum":"6","createTime":"2021-10-10 11:52:06.249","updateTime":null}
  13. {"index":{"_index":"blog","_id":7}}
  14. {"blogId":7,"title":"Java学习教程7","content":"这是批量添加的文档7","author":"Pepper","category":"ElasticSearch","status":1,"serialNum":"7","createTime":"2021-10-10 11:52:07.249","updateTime":null}
  15. {"index":{"_index":"blog","_id":8}}
  16. {"blogId":8,"title":"Java学习教程8","content":"这是批量添加的文档8","author":"Pepper","category":"ElasticSearch","status":1,"serialNum":"8","createTime":"2021-10-10 11:52:08.249","updateTime":null}
  17. {"index":{"_index":"blog","_id":9}}
  18. {"blogId":9,"title":"Java学习教程9","content":"这是批量添加的文档9","author":"Pepper","category":"ElasticSearch","status":1,"serialNum":"9","createTime":"2021-10-10 11:52:09.249","updateTime":null}
  19. {"index":{"_index":"blog","_id":10}}
  20. {"blogId":10,"title":"Java学习教程10","content":"这是批量添加的文档10","author":"Pepper","category":"ElasticSearch","status":1,"serialNum":"10","createTime":"2021-10-10 11:52:10.249","updateTime":null}

执行之后的结果

实例:手写DSL

去重的字段不能是text类型。所以,author的mapping要有keyword,且通过author.keyword去重。

如果去重字段是其他可以直接去重的类型,比如:数字类型、keyword、日期等,则直接用字段名就可以。即:如果本处author是keyword,则author.keyword处写成author就行。

collapse获取去重结果

POST /blog/_search

  1. {
  2. "query": {
  3. "match": {
  4. "title":{
  5. "query": "java"
  6. }
  7. }
  8. },
  9. "collapse":{
  10. "field": "author.keyword"
  11. }
  12. }

结果:

我把结果全部贴出来

  1. {
  2. "took": 2,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 5,
  13. "relation": "eq"
  14. },
  15. "max_score": null,
  16. "hits": [{
  17. "_index": "blog",
  18. "_type": "_doc",
  19. "_id": "9",
  20. "_score": 1.0596458,
  21. "_source": {
  22. "blogId": 9,
  23. "title": "Java学习教程9",
  24. "content": "这是批量添加的文档9",
  25. "author": "Pepper",
  26. "category": "ElasticSearch",
  27. "status": 1,
  28. "serialNum": "9",
  29. "createTime": "2021-10-10 11:52:09.249",
  30. "updateTime": null
  31. },
  32. "fields": {
  33. "author.keyword": [
  34. "Pepper"
  35. ]
  36. }
  37. }, {
  38. "_index": "blog",
  39. "_type": "_doc",
  40. "_id": "6",
  41. "_score": 0.7361701,
  42. "_source": {
  43. "blogId": 6,
  44. "title": "Java学习教程6",
  45. "content": "这是批量添加的文档6",
  46. "author": "Tony",
  47. "category": "ElasticSearch",
  48. "status": 1,
  49. "serialNum": "6",
  50. "createTime": "2021-10-10 11:52:06.249",
  51. "updateTime": null
  52. },
  53. "fields": {
  54. "author.keyword": [
  55. "Tony"
  56. ]
  57. }
  58. }
  59. ]
  60. }
  61. }

聚合获取去重结果

上边是文章的部分内容,为便于维护,全文已转移到此网址:ElasticSearch-去重查询的方法 - 自学精灵

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/899431
推荐阅读
相关标签
  

闽ICP备14008679号