当前位置:   article > 正文

ElasticSearch笔记_searchsourcebuilder.aggregation

searchsourcebuilder.aggregation


0、配置


windows版

1、ES

  • 点击启动 elasticsearch.bat

  • 配置跨域 config/elasticsearch.yml

http.cors.enabled: true
http.cors.allow-origin: "*"
  • 1
  • 2

2、elasticsearch-head-master

  • 安装环境 npm install
  • 启动 npm run start

3、kibana

  • 点击启动 kibana.bat

Linux版

  • 拉取镜像
docker pull elasticsearch:7.6.2
docker pull kibana:7.6.2
  • 1
  • 2
  • 创建挂载文件
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml
chmod -R 777 /mydata/elasticsearch/
  • 1
  • 2
  • 3
  • 4
  • 安装 ES
docker run --name es -p 9200:9200 -p 9300:9300 \
-e  "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.6.2 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 安装 kibana
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 \
-d kibana:7.4.2
注意写自己的虚拟机ip
  • 1
  • 2
  • 3
  • 开机自动启动
docker update es --restart=always
docker update kibana --restart=always
  • 1
  • 2

1、基本语法


在这里插入图片描述


01、两种检索方式


第一种检索方式 URI方式

  • URI+检索参数
GET bank/_search?q=*&sort=account_number:asc
  • 1

基础1


第二种检索方式 Query DSL方式

  • URI + 请求体(Query DSL)
GET bank/_search
{
  "query": {
    "match_all": {} //查询全部
  },
  "sort": [
    {
      "account_number": {
        "order":"asc" //升序 (规范)
      } 
    },
    {
      "balance": "desc" //降序 (简写)
    }
  ],
  "from": 0, //从第几条开始
  "size": 6  //查询几条
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

语法

  • 一个查询语句的典型结构
{
  QUERY_NAME:{
    参数:VALUE,
    参数:VALUE,......
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 针对某个字段的结构
{
  QUERY_NAME:{
    FIELD_NAME: {
      	参数:VALUE,
		参数:VALUE,......
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

02、match 全文检索


  • 全文检索按照评分进行排序,会对检索条件进行分词匹配

在这里插入图片描述


03、match_phrase 短语匹配


  • 将需要匹配的值当成一个整体单词(不分词,match分词),进行检索
GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

在这里插入图片描述


04、multi_match 多字段匹配


在这里插入图片描述


05、bool 复合查询(mustmust_notshould


  • bool用来做复合查询:
  • 复合语句可以合并任何其它查询语句,包括复合语句,了解这一点是很重要的。这就意味着,复合语句之间可以互相嵌套,可以表达非常复杂的逻辑。

在这里插入图片描述


06、filter 结果过滤(range


  • 并不是所有的查询都需要产生分数,特别是那些仅用于“filtering"(过滤)的文档。为了不计算分数 Elasticsearch 会自动检查场景并且优化查询的执行。

在这里插入图片描述


07、term 全文检索精确匹配.keyword 精确匹配


  • 和match一样。匹配某个属性的值。全文检索字段用match,精确匹配用term
  • text用match,非text用term

在这里插入图片描述


08、aggregations 聚合


聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于SQL GROUP BY和SQL聚合函数。在Elasticsearch 中,您有执行搜索返回hits(命中结果),并且同时返回聚合结果,把一个响应中的所有hits(命中结果)分隔开的能力。这是非常强大且有效的,您可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用一次简洁和简化的API来避免网络往返。


普通聚合

示例1、搜索address中包含mill的所有人的年龄分布以及平均年龄,但不显示这些人的详情(可以加size:0)

GET bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      }
    },
    "ageAvg": {
      "avg": {
        "field": "age"
      }
    },
    "balanceAvg": {
      "avg": {
        "field": "balance"
      }
    }
  },
  "size": 0
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

在这里插入图片描述


代码形式

  • 封装的实体类
@Data
@ToString
static class Account {

  private int account_number;
  private int balance;
  private String firstname;
  private String lastname;
  private int age;
  private String gender;
  private String address;
  private String employer;
  private String email;
  private String city;
  private String state;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 业务
@Test
public void searchData() throws IOException {
  //1、创建一个检索请求
  SearchRequest searchRequest = new SearchRequest();
  //指定索引
  searchRequest.indices("bank");
  //指定检索条件  创建  搜索内容参数设置对象:SearchSourceBuilder
  // SearchSourceBuilder sourceBuilder 为封装的条件
  SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

  //1、1   构造检索条件
  //根操作
  //        sourceBuilder.query();
  //        sourceBuilder.from();
  //        sourceBuilder.size();
  //        sourceBuilder.aggregation(); //聚合
  //查询包含关键词字段的文档:如下,表示查询出来所有包含 address 字段且user字段包含 mill 值的文档
  sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));//QueryBuilders 工具类

  //1、2   按照年龄的值分布进行聚合
  //AggregationBuilders.terms("name").field("id"); 相当于sql的group by 分组
  TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
  sourceBuilder.aggregation(ageAgg);

  //1、3	  计算平均年龄
  AvgAggregationBuilder ageAvg = AggregationBuilders.avg("ageAvg").field("age");
  sourceBuilder.aggregation(ageAvg);

  //1、4   计算平均薪资
  AvgAggregationBuilder balanceAvg = AggregationBuilders.avg("balanceAvg").field("balance");
  sourceBuilder.aggregation(balanceAvg);

  System.out.println("检索条件" + sourceBuilder.toString());

  //将SearchSourceBuilder对象添加到搜索请求中:
  searchRequest.source(sourceBuilder);

  //2、执行检索
  SearchResponse searchResponse = client.search(searchRequest, ElasticSearchConfig.COMMON_OPTIONS);
  //3、分析结果 searchResponse
  System.out.println(searchResponse.toString());

  //        JSON.parseObject(searchResponse.toString(), Map.class);

  //3、1   获取所有查到的数据
  /***
         *      {
         *   "took" : 3,
         *   "timed_out" : false,
         *   "_shards" : {......},
         *   "hits" : {
         *     "total" : {......},
         *     "max_score" : 5.4032025,
         *     "hits" : [
         *       {
         *         "_index" : "bank",
         *         "_type" : "account",
         *         "_id" : "345",
         *         "_score" : 5.4032025,
         *         "_source" : {
         *           "account_number" : 345,
         *           "balance" : 9812,
         *           "firstname" : "Parker",
         *           "lastname" : "Hines",
         *           "age" : 38,
         *           "gender" : "M",
         *           "address" : "715 Mill Avenue",
         *           "employer" : "Baluba",
         *           "email" : "parkerhines@baluba.com",
         *           "city" : "Blackgum",
         *           "state" : "KY"
         *         }
         *       },
         *       {......}
         *       }
         *     ]
         *   },
         *   "aggregations" : {
         *     "ageAgg" : {
         *       "doc_count_error_upper_bound" : 0,
         *       "sum_other_doc_count" : 0,
         *       "buckets" : [
         *         {
         *           "key" : 38,
         *           "doc_count" : 2
         *         },
         *         {
         *           "key" : 28,
         *           "doc_count" : 1
         *         },
         *         {
         *           "key" : 32,
         *           "doc_count" : 1
         *         }
         *       ]
         *     },
         * 	   "ageAvg": {
         * 		  "value": 34.0
         *     },
         *     "balanceAvg" : {
         *       "value" : 25208.0
         *     }
         *   }
         * }
         */
  SearchHits hits = searchResponse.getHits();     //第一层hits
  SearchHit[] searchHits = hits.getHits();    //第二层hits(包含真正的所有记录)
  for (SearchHit hit : searchHits){
    /***
             *         "_index" : "bank",
             *         "_type" : "account",
             *         "_id" : "345",
             *         "_score" : 5.4032025,
             *         "_source" : { ...... } //我们想要的数据在 _source里面
             */
    //            hit.getIndex();
    //            hit.getType();
    //            hit.getId();
    //            hit.getScore();
    //            hit.getScore();

    String string = hit.getSourceAsString(); //string形式输出(此时是json字符串)
    Account account = JSON.parseObject(string, Account.class); //json字符串 --> account对象
    System.out.println("每个Account对象:" + account);
  }

  //3、2   获取这次索引到的分析信息
  Aggregations aggregations = searchResponse.getAggregations(); //得到全部的聚合信息
  for (Aggregation aggregation : aggregations.asList()) { //先转成list再for循环出
    System.out.println("当前聚合:" + aggregation.getName());

  }
  //遍历聚合数据
  Terms ageAgg1 = aggregations.get("ageAgg");
  for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
    System.out.println("年龄:" + bucket.getKeyAsString());
    System.out.println("人数:" + bucket.getDocCount());
  }

  Avg ageAvg1 = aggregations.get("ageAvg");
  System.out.println("平均年龄:" + ageAvg1.getValue());

  Avg balanceAvg1 = aggregations.get("balanceAvg");
  System.out.println("平均薪资:" + balanceAvg1.getValue());
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 输出
检索条件{"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},"aggregations":{"ageAgg":{"terms":{"field":"age","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"ageAvg":{"avg":{"field":"age"}},"balanceAvg":{"avg":{"field":"balance"}}}}

{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"forbeswallace@pheast.com","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"winnieholland@neteria.com","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"parkerhines@baluba.com","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"leelong@comverges.com","city":"Movico","state":"MT"}}]},"aggregations":{"lterms#ageAgg":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":38,"doc_count":2},{"key":28,"doc_count":1},{"key":32,"doc_count":1}]},"avg#ageAvg":{"value":34.0},"avg#balanceAvg":{"value":25208.0}}}

每个Account对象:GulimallEsApplicationTests.Account(account_number=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
每个Account对象:GulimallEsApplicationTests.Account(account_number=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
每个Account对象:GulimallEsApplicationTests.Account(account_number=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
每个Account对象:GulimallEsApplicationTests.Account(account_number=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)
  
当前聚合:ageAgg
当前聚合:ageAvg
当前聚合:balanceAvg
年龄:38
人数:2
年龄:28
人数:1
年龄:32
人数:1
平均年龄:34.0
平均薪资:25208.0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

子聚合

示例2、按照年龄聚合,并且求出每个年龄段的平均薪资

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      },
      "aggs": {
        "ageAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

基础9



示例3、查出所有的年龄分布,并且这些年龄段中的M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      },
      "aggs": {
        "genderAgg": {
          "terms": {
            "field": "gender.keyword",
            "size": 10
          },
          "aggs": {
            "balanceAgg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

在这里插入图片描述


09、nested 嵌入式


  • 假设这样存一条索引
PUT my-index-000001/_doc/1
{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 此处我们想找 user.first为Alice user.last为Smith的,但是并没有规定这两条数据
  • 我们却发现可以检索到结果 这是不正确的
GET my-index-000001/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

发现ES把 John SmithAlice White 拆开了 文档关联错误

  • ES会扁平化处理。如下:
{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}
  • 1
  • 2
  • 3
  • 4
  • 5

在这里插入图片描述


改进

  • 添加新的索引 指定type为nested
//1、删掉原来的索引
DELETE my-index-000001

//2、添加新的索引  指定type为nested
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "user": {
        "type": "nested" 
      }
    }
  }
}

//3、重新添加数据
PUT my-index-000001/_doc/1
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

//4、查询
GET my-index-000001/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 执行结果
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

2、映射


Mapping是用来定义一个文档(document),以及它所包含的属性(field)是如何存储和索引的。比如,使用mapping 来定义:

  • 哪些字符串属性应该被看做全文本属性(full text fields)。
  • 哪些属性包含数字,日期或者地理位置。
  • 文档中的所有属性是否都能被索引(_all配置)。
  • 日期的格式。
  • 自定义映射规则来执行动态添加属性。

01、查看映射 mapping


查看 bank 的 mapping 信息

  • GET bank/_mapping
  • 数字都会拆分成 long 类型
  • 字符串都会拆分成 text 类型(且都有keyword属性 用来做全文检索)
{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {  //long类型
          "type" : "long"
        },
        "address" : {
          "type" : "text", //text类型  text类型会全文检索 -----> 分词
          "fields" : {
            "keyword" : { //有子属性 keyword
              "type" : "keyword", //keyword类型
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90

02、创建映射规则


PUT wula_index
{
  "mappings": {
    "properties": {
      "age": {"type": "integer"},
      "email": {"type": "keyword"},
      "name": {"type": "text"}
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 执行结果
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "wula_index"
}
  • 1
  • 2
  • 3
  • 4
  • 5

类型

1、string: 默认会被分词 text

2、数字类型主要如下几种:

  • long:64位存储
  • integer:32位存储
  • short:16位存储
  • byte:8位存储
  • double:64位双精度存储
  • float:32位单精度存储

3、复合类型

  • 数组类型:没有明显的字段类型设置,任何一个字段的值,都可以被添加0个到多个,要求,他们的类型必须一致:
  • 对象类型:存储类似json具有层级的数据
  • 嵌套类型:支持数组类型的对象Aarray[Object],可层层嵌套

4、地理类型

  • geo-point类型: 支持经纬度存储和距离范围检索
  • geo-shape类型:支持任意图形范围的检索,例如矩形和平面多边形

5、专用类型

  • ipv4类型:用来存储IP地址,es内部会转换成long存储
  • completion类型:使用fst有限状态机来提供suggest前缀查询功能
  • token_count类型:提供token级别的计数功能
  • mapper-murmur3类型:安装sudo bin/plugin install mapper-size插件,可支持_size统计_source数据的大小

6、多值字段:

  • 一个字段的值,可以通过多种分词器存储,使用fields参数,支持大多数es数据类型

03、添加映射规则


给已创建的映射规则添加映射规则

  • "index": false 不需要被索引(不能被检索) 默认为true
PUT wula_index/_mapping
{
  "properties": {
    "empoyee-id": { //新增加属性
      "type": "keyword", //属性的类型定义为 keyword
      "index": false //不需要被索引(不能被检索) 默认为true
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 执行结果
{
  "acknowledged" : true
}
  • 1
  • 2
  • 3

查看一下此索引的映射

  • GET wula_index/_mapping
{
  "wula_index" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "empoyee-id" : {
          "type" : "keyword",
          "index" : false
        },
        "name" : {
          "type" : "text"
        }
      }
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

04、修改映射规则 数据迁移


ES 并不支持修改mapping映射字段,无法直接修改,更改现有字段可能会使已经索引的数据失效。

如果需要更改字段的映射,则需要创建一个具有正确映射的新索引,并将数据重新保存到新的索引中。

重命名字段将使已在旧字段名下索引的数据失效。相反,添加一个别名字段来创建一个备用字段名。

1、创建一个新映射

PUT new_bank
{
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "balance": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "employer": {
        "type": "keyword"
      },
      "firstname": {
        "type": "text"
      },
      "gender": {
        "type": "keyword"
      },
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "keyword"
      }
    }
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 运行结果
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "new_bank"
}
  • 1
  • 2
  • 3
  • 4
  • 5

2、数据迁移

在这里插入图片描述

POST _reindex
{
  "source": {
    "index": "bank", //旧的索引
    "type": "account" //迁移的是什么类型下的数据
  },
  "dest": {
    "index": "new_bank" //新的索引
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 执行结果
{
  "took" : 735,
  "timed_out" : false,
  "total" : 1000,
  "updated" : 1000,
  "created" : 0,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

在这里插入图片描述


3、IK分词器


1、安装ik分词器

  • 本项目ik分词器的装载目录结构
  • /mydata/elasticsearch/plugins/ik

示例:测试数据1

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "玛卡巴卡"
}

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "玛卡巴卡"
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 执行结果
{
  "tokens" : [
    {
      "token" : "玛",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "卡巴",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "卡",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 2
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

发现 玛卡巴卡 并不是一个完整的词语

2、安装 nginx 并挂载 conf html logs 3个文件夹 并启动 nginx

  • 然后在 html文件夹下新建es文件夹,并在其下面新建 fenci.txt
  • 目录结构:/mydata/nginx/html/es
  • 在fenci.txt里面输入内容:玛卡巴卡
  • 访问 http://192.168.56.10/es/fenci.txt 页面显示 玛卡巴卡

3、修改ik分词器 config 下的 IKAnalyzer.cfg.xml

  • 本项目的ik分词器挂载目录
  • /mydata/elasticsearch/plugins/ik/config
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict"></entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <entry key="remote_ext_dict">http://192.168.56.10/es/fenci.txt</entry>
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 修改完后重启 ES

4、测试数据2

  • ik_smart
POST _analyze
{
  "analyzer": "ik_smart",
  "text": "玛卡巴卡"
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 执行结果
{
  "tokens" : [
    {
      "token" : "玛卡巴卡",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • ik_max_word
POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "玛卡巴卡"
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 执行结果
{
  "tokens" : [
    {
      "token" : "玛卡巴卡",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "卡巴",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "卡",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 2
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

4、API


00、测试配置


  • 实体类
@Data   //Getter,Setter,equals,canEqual,hasCode,toString
@AllArgsConstructor   //有参构造
@NoArgsConstructor    //无参构造
@Component
public class User {
    private String name;
    private int age;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • Config
@Configuration
public class ElasticSearchClientConfig {
    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("192.168.31.141",9200,"http")));
                return client;
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • Utils
public class ESconst {
    public static final String ES_INDEX = "wulawula_index";
}
  • 1
  • 2
  • 3
  • pom
<dependencies>

  <!--阿里巴巴JSON-->
  <dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.62</version>
  </dependency>

  <!--导入了elasticsearch-->
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-devtools</artifactId>
    <scope>runtime</scope>
    <optional>true</optional>
  </dependency>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-configuration-processor</artifactId>
    <optional>true</optional>
  </dependency>
  <dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <optional>true</optional>
  </dependency>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
    <exclusions>
      <exclusion>
        <groupId>org.junit.vintage</groupId>
        <artifactId>junit-vintage-engine</artifactId>
      </exclusion>
    </exclusions>
  </dependency>
</dependencies>

<build>
  <plugins>
    <plugin>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-maven-plugin</artifactId>
    </plugin>
  </plugins>
</build>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • Test
@Autowired
    @Qualifier("restHighLevelClient")
    /***
     * 此处 Client 应该为 restHighLevelClient,
     * 若不想用 restHighLevelClient,就用注解 @Qualifier("restHighLevelClient")
     * 且标明是 restHighLevelClient
     */
    private RestHighLevelClient client   /*restHighLevelClient*/   ;

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

01、创建索引


@Test
void testCreateIndex() throws IOException {
    //1、创建索引请求
    CreateIndexRequest request = new CreateIndexRequest("wulawula_index");
    //2、客户端执行请求  IndexResponse,请求后获得响应
    CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
    System.out.println(createIndexResponse);
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 输出
org.elasticsearch.client.indices.CreateIndexResponse@8164fe18
  • 1

在这里插入图片描述


02、获取索引


  • 索引是一个库,只能判断存不存在
@Test
void testExistIndex() throws IOException {
    GetIndexRequest request = new GetIndexRequest("wulawula_index");
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    System.out.println(exists); //true
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 输出
true
  • 1

03、删除索引


@Test
void testDeleteIndex() throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("wulawula_index");
    AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
    System.out.println(delete.isAcknowledged()); // delete.isAcknowledged()返回 true
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 输出
true
  • 1

04、添加数据


  • 需要JSON格式
  • 规则 put/wulawula_index/_doc/1
@Test
void  testAddDocument() throws IOException {
    //创建请求
    IndexRequest request = new IndexRequest("wulawula_index");
	//设置id  不设置的话会随机生成一个id
    request.id("1");
  	//创建对象
    User user = new User();
  	user.setUsername("乌拉");
  	user.setAge(3);

  	//设置超时时间 这两种都可以
    //request.timeout(TimeValue.timeValueSeconds(1));
    request.timeout("1s");

    String jsonString = JSON.toJSONString(user);    /* 要转成json形式存进去 */
  	request.source(jsonString, XContentType.JSON); //要保存的内容  XContentType.JSON:内容类型

	//执行索引保存操作              index(要保存的请求,请求的设置项)
  	IndexResponse index = client.index(request, RequestOptions.DEFAULT);

    System.out.println(index);     //对应命令返回的状态 CREATED
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 输出
IndexResponse[index=wulawula_index,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
  • 1

API2


05、判断数据是否存在


  • 判断是否存在 get/wulawula_index/doc/1
@Test
void testIsExists() throws IOException {
    GetRequest getRequest = new GetRequest("wulawula_index", "1");
    //不获取返回的_source的上下文了
    getRequest.fetchSourceContext(new FetchSourceContext(false));
    getRequest.storedFields("_none_");

    boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
    System.out.println(exists); //true
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 输出
true
  • 1

06、获取数据信息


//获取文档的信息
@Test
void testGetDocument() throws IOException {
    GetRequest getRequest = new GetRequest("wulawula_index", "1");
    GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
    System.out.println(getResponse.getSourceAsString()); //打印文档的内容
    System.out.println(getResponse);
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 输出
{"age":3,"name":"乌拉"}
{"_index":"wulawula_index","_type":"_doc","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"age":3,"name":"乌拉"}}
  • 1
  • 2

API3


07、更新数据


@Test
void testUpdateDocument() throws IOException {
    UpdateRequest updateRequest = new UpdateRequest("wulawula_index", "1");
    updateRequest.timeout("1s");

    User user = new User("乌拉乌拉Java", 18);
    updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

    UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
    System.out.println(updateResponse.status());
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 输出
OK
  • 1

API4


08、删除数据


@Test
void testDeleteDocument() throws IOException {
    DeleteRequest request = new DeleteRequest("wulawula_index", "1");
    request.timeout("1s");

    DeleteResponse deleteResponse = client.delete(request, RequestOptions.DEFAULT);
    System.out.println(deleteResponse.status());
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 输出
OK
  • 1

在这里插入图片描述


09、批量插入数据 bulk


  • bulk 批量写入
@Test
void testBulkDocument() throws IOException {
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.timeout("10s");

    ArrayList<User> userList = new ArrayList<>();
    userList.add(new User("wula1",1));
    userList.add(new User("wula2",2));
    userList.add(new User("wula3",3));
    userList.add(new User("wula4",6));
    userList.add(new User("wula5",6));
    userList.add(new User("wula6",6));

    // 批处理请求
    for (int i = 0; i < userList.size(); i++) {

      	// 批量更新和批量删除,就在这里修改对应的请求就可以了
      	bulkRequest.add(new IndexRequest("wulawula_index")
                      .id(""+(i+1))
                      .source(JSON.toJSONString(userList.get(i)),XContentType.JSON)
        );
    }
	// 									bulk 批量写入
  	BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
  	System.out.println(bulkResponse.hasFailures()); //是否失败  false
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 输出
false  //是否失败  false:未失败   true:失败
  • 1

API6


10、查询


代码注释
SearchRequest搜索请求
SearchSourceBuilder条件构造
HighlightBuilder构建高亮
TermQueryBuilder精确查询
MatchAllQueryBuilder查询全部
//查询
@Test
void testSearch() throws IOException {
    SearchRequest searchRequest = new SearchRequest(ESconst.ES_INDEX);
    //构建查询条件
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

    //查询条件 可以使用 QueryBuilders 工具类 进行快速匹配
  	//QueryBuilders.termQuery 精确匹配
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "wula6");
	
  	//搜索条件
    sourceBuilder.query(termQueryBuilder);
    sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); //超时时间
	
  	//放到请求里
    searchRequest.source(sourceBuilder);
	
  	//执行请求
    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    System.out.println(JSON.toJSONString(searchResponse.getHits())); //所有的结果都封装在hits里面
    System.out.println("======================================");
 
    for (SearchHit documentFields : searchResponse.getHits().getHits()){
      System.out.println(documentFields.getSourceAsMap());
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 输出
{"fragment":true,"hits":[{"fields":{},"fragment":false,"highlightFields":{},"id":"4","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.6931471,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"wula6","age":4},"sourceAsString":"{\"age\":4,\"name\":\"wula6\"}","sourceRef":{"fragment":true},"type":"_doc","version":-1},{"fields":{},"fragment":false,"highlightFields":{},"id":"5","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.6931471,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"wula6","age":5},"sourceAsString":"{\"age\":5,\"name\":\"wula6\"}","sourceRef":{"fragment":true},"type":"_doc","version":-1},{"fields":{},"fragment":false,"highlightFields":{},"id":"6","matchedQueries":[],"primaryTerm":0,"rawSortValues":[],"score":0.6931471,"seqNo":-2,"sortValues":[],"sourceAsMap":{"name":"wula6","age":6},"sourceAsString":"{\"age\":6,\"name\":\"wula6\"}","sourceRef":{"fragment":true},"type":"_doc","version":-1}],"maxScore":0.6931471,"totalHits":{"relation":"EQUAL_TO","value":3}}
======================================
{name=wula6, age=4}
{name=wula6, age=5}
{name=wula6, age=6}
  • 1
  • 2
  • 3
  • 4
  • 5
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/68140
推荐阅读
相关标签
  

闽ICP备14008679号