IT小白

这个屌丝很懒，什么也没留下！

热门标签

Elasticsearch7.6（windows版单机版）api使用及JD搜索高亮显示_windows elasticsearch 单机版

作者：IT小白 | 2024-05-06 09:04:12

踩

windows elasticsearch 单机版

Elasticsearch

Elasticsearch安装

我们需要下载和安装ElasticSearch的服务端和客户端！

注意：
ElasticSearch是使用java开发的，且本版本的es需要的jdk版本要是1.8以上，所以安装ElasticSearch 之
前保证JDK1.8+安装完毕，并正确的配置好JDK环境变量，否则启动ElasticSearch失败。

下载

ElasticSearch的官方地址： https://www.elastic.co/products/elasticsearch

官方下载地址：https://www.elastic.co/cn/downloads/elasticsearch （很慢，可以翻墙下载！）

win下载：https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1.zip

我这里已经帮大家下载好了，Linux 和 window 版的！

我们学习的话使用 window 或者 linux 都是可以的，对于我们 java 开发来说没有区别，只是连接的问
题！
Windows更加方便一点！所以我们前期都是用Window安装使用！后面我们再真正的安装到Linux服务器
上跑项目！

附上个人网盘下载地址及笔记，后面涉及到需要安装的部分都可以直接从网盘获取，安装即解压。

链接：https://pan.baidu.com/s/1B7cKgRa0y29tJSxmB6ppAQ 
提取码：oqo7 
--llp
1
2
3

windows下安装使用

bin：启动文件
config：配置文件
log4j2.properties：日志配置文件
jvm.options：java虚拟机的配置
elasticsearch.yml：es的配置文件
data：索引数据目录
lib：相关类库Jar包
logs：日志目录
modules：功能模块
plugins：插件
1
2
3
4
5
6
7
8
9
10

双击ElasticSearch下的bin目录中的elasticsearch.bat启动，控制台显示的日志（等待启动完
毕！）：

然后在浏览器访问：http://localhost:9200 得到如下信息，说明安装成功了:

安装ES的图形化界面插件客户端

注意：需要NodeJS的环境，我们讲解大前端进阶已经安装过了，没安装的需要安装！
Head是elasticsearch的集群管理工具，可以用于数据的浏览查询！被托管在github上面！

地址： https://github.com/mobz/elasticsearch-head/

1、下载 elasticsearch-head-master.zip
2、解压之后安装依赖！

cnpm install
npm run start
1
2

这将启动在端口9100上运行的本地web服务器，为elasticsearch-head服务！访问测试：

# 跨域配置：
http.cors.enabled: true
http.cors.allow-origin: "*"
1
2
3

启动ElasticSearch，使用head工具进行连接测试！可以看到默认的集群名就叫elasticsearch

了解

ELK是Elasticsearch、Logstash、Kibana三大开源框架首字母大写简称。市面上也被成为Elastic
Stack。其中Elasticsearch是一个基于Lucene、分布式、通过Restful方式进行交互的近实时搜索平台框
架。像类似百度、谷歌这种大数据全文搜索引擎的场景都可以使用Elasticsearch作为底层支持框架，可
见Elasticsearch提供的搜索能力确实强大,市面上很多时候我们简称Elasticsearch为es。Logstash是ELK
的中央数据流引擎，用于从不同目标（文件/数据存储/MQ）收集的不同格式数据，经过过滤后支持输出
到不同目的地（文件/MQ/redis/elasticsearch/kafka等）。Kibana可以将elasticsearch的数据通过友好
的页面展示出来，提供实时分析的功能。
市面上很多开发只要提到ELK能够一致说出它是一个日志分析架构技术栈总称，但实际上ELK不仅仅适用
于日志分析，它还可以支持其它任何数据分析和收集的场景，日志分析和收集只是更具有代表性。并非
唯一性。

安装kibana

Kibana是一个针对Elasticsearch的开源分析及可视化平台，用来搜索、查看交互存储在Elasticsearch索
引中的数据。使用Kibana，可以通过各种图表进行高级数据分析及展示。Kibana让海量数据更容易理
解。它操作简单，基于浏览器的用户界面可以快速创建仪表板（dashboard）实时显示Elasticsearch查
询动态。设置Kibana非常简单。无需编码或者额外的基础架构，几分钟内就可以完成Kibana安装并启动
Elasticsearch索引监测。
官网：https://www.elastic.co/cn/kibana
1、下载Kibana https://www.elastic.co/cn/downloads/kibana （注意版本对应关系）

2、将压缩包解压即可（需要一些时间）！
3、然后进入到bin目录下，双击kibana.bat启动服务就可以了（需要等待启动完成），ELK基本上都是拆箱即用的

4、然后访问IP:5601，kibana会自动去访问9200，也就是elasticsearch的端口号（当然elasticsearch这
个时候必须启动着），然后就可以使用kibana了

5、现在是英文的，看着有些吃力，我们配置为中文的！

中文包在 kibana\x-pack\plugins\translations\translations\zh-CN.json
只需要在配置文件 kibana.yml 中加入，如果启动失败检查一下编辑的时候件编码格式是否为utf-8

i18n.locale: "zh-CN"
1

6、重启查看效果！成功切换为中文的了！

至此,elasticsearch、head、kibana安装就完成了。

ES核心概念

概述

在前面的学习中，我们已经掌握了es是什么，同时也把es的服务已经安装启动，那么es是如何去存储数
据，数据结构是什么，又是如何实现搜索的呢？我们先来聊聊ElasticSearch的相关概念吧！

集群，节点，索引，类型，文档，分片，映射是什么？

elasticsearch是面向文档，关系行数据库和 elasticsearch 客观的对比！

elasticsearch(集群)中可以包含多个索引(数据库)，每个索引中可以包含多个类型(表)，每个类型下又包
含多个文档(行)，每个文档中又包含多个字段(列)。
物理设计：
elasticsearch 在后台把每个索引划分成多个分片，每分分片可以在集群中的不同服务器间迁移
逻辑设计：
一个索引类型中，包含多个文档，比如说文档1，文档2。当我们索引一篇文档时，可以通过这样的一各
顺序找到它: 索引 ▷ 类型 ▷ 文档ID ，通过这个组合我们就能索引到某个具体的文档。注意:ID不必是整
数，实际上它是个字符串。

文档

类型

类型是文档的逻辑容器，就像关系型数据库一样，表格是行的容器。类型中对于字段的定义称为映射，
比如 name 映射为字符串类型。我们说文档是无模式的，它们不需要拥有映射中所定义的所有字段，
比如新增一个字段，那么elasticsearch是怎么做的呢?elasticsearch会自动的将新字段加入映射，但是这
个字段的不确定它是什么类型，elasticsearch就开始猜，如果这个值是18，那么elasticsearch会认为它
是整形。但是elasticsearch也可能猜不对，所以最安全的方式就是提前定义好所需要的映射，这点跟关
系型数据库殊途同归了，先定义好字段，然后再使用，别整什么幺蛾子。

索引

索引是映射类型的容器，elasticsearch中的索引是一个非常大的文档集合。索引存储了映射类型的字段
和其他设置。然后它们被存储到了各个分片上了。我们来研究下分片是如何工作的。
物理设计：节点和分片如何工作
一个集群至少有一个节点，而一个节点就是一个elasricsearch进程，节点可以有多个索引默认的，如果
你创建索引，那么索引将会有个5个分片 ( primary shard ,又称主分片 ) 构成的，每一个主分片会有一个
副本 ( replica shard ,又称复制分片）

上图是一个有3个节点的集群，可以看到主分片和对应的复制分片都不会在同一个节点内，这样有利于某
个节点挂掉了，数据也不至于丢失。实际上，一个分片是一个Lucene索引，一个包含倒排索引的文件
目录，倒排索引的结构使得elasticsearch在不扫描全部文档的情况下，就能告诉你哪些文档包含特定的
关键字。不过，等等，倒排索引是什么鬼?

倒排索引

elasticsearch使用的是一种称为倒排索引的结构，采用Lucene倒排索作为底层。这种结构适用于快速的
全文搜索，一个索引由文档中所有不重复的列表构成，对于每一个词，都有一个包含它的文档列表。例
如，现在有两个文档，每个文档包含如下内容：

Study every day, good good up to forever # 文档1包含的内容
To forever, study every day, good good up # 文档2包含的内容
1
2

为了创建倒排索引，我们首先要将每个文档拆分成独立的词(或称为词条或者tokens)，然后创建一个包
含所有不重复的词条的排序列表，然后列出每个词条出现在哪个文档 :

现在，我们试图搜索 to forever，只需要查看包含每个词条的文档

两个文档都匹配，但是第一个文档比第二个匹配程度更高。如果没有别的条件，现在，这两个包含关键
字的文档都将返回。
再来看一个示例，比如我们通过博客标签来搜索博客文章。那么倒排索引列表就是这样的一个结构 :

如果要搜索含有 python 标签的文章，那相对于查找所有原始数据而言，查找倒排索引后的数据将会快
的多。只需要查看标签这一栏，然后获取相关的文章ID即可。

ES基础操作

IK分词器插件

什么时IK分词器

分词：即把一段中文或者别的划分成一个个的关键字，我们在搜索时候会把自己的信息进行分词，会把
数据库中或者索引库中的数据进行分词，然后进行一个匹配操作，默认的中文分词是将每个字看成一个
词，比如 “我爱狂神” 会被分为"我",“爱”,“狂”,“神”，这显然是不符合要求的，所以我们需要安装中文分词
器ik来解决这个问题。
IK提供了两个分词算法：ik_smart 和 ik_max_word，其中 ik_smart 为最少切分，ik_max_word为最细
粒度划分！一会我们测试！

安装步骤

1、下载ik分词器的包，Github地址：https://github.com/medcl/elasticsearch-analysis-ik/ （版本要对
应）
2、下载后解压，并将目录拷贝到ElasticSearch根目录下的 plugins 目录中。

3、重新启动 ElasticSearch 服务，在启动过程中，你可以看到正在加载"analysis-ik"插件的提示信息，
服务启动后，在命令行运行 elasticsearch-plugin list 命令，确认 ik 插件安装成功。

4、在 kibana 中测试 ik 分词器，并就相关分词结果和 icu 分词器进行对比。
ik_max_word : 细粒度分词，会穷尽一个语句中所有分词可能，测试！

ik_smart : 粗粒度分词，优先匹配最长词，只有1个词！

5、我们输入超级喜欢狂神说！发现狂神说被切分了

如果我们想让系统识别“狂神说”是一个词，需要编辑自定义词库。
步骤：
（1）进入elasticsearch/plugins/ik/config目录
（2）新建一个my.dic文件，编辑内容：

狂神说

（3）修改IKAnalyzer.cfg.xml（在ik/config目录下）

<properties>
<comment>IK Analyzer 扩展配置</comment>
<!-- 用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">my.dic</entry>
<!-- 用户可以在这里配置自己的扩展停止词字典 -->
<entry key="ext_stopwords"></entry>
</properties>
1
2
3
4
5
6
7

修改完配置重新启动elasticsearch，再次测试！
发现监视了我们自己写的规则文件：

再次测试，发现狂神说变成了一个词：

到了这里，我们就明白了分词器的基本规则和使用了！

Rest风格说明

一种软件架构风格，而不是标准，只是提供了一组设计原则和约束条件。它主要用于客户端和服务器交
互类的软件。基于这个风格设计的软件可以更简洁，更有层次，更易于实现缓存等机制。
基本Rest命令说明：

基础测试

1、首先我们浏览器 http://localhost:5601/ 进入 kibana里的Console
2、首先让我们在 Console 中输入 :

// 命令解释
// PUT 创建命令 test1 索引 type1 类型 1 id
PUT /test1/type1/1
{
"name":"狂神说", // 属性
"age":16 // 属性
}
1
2
3
4
5
6
7

返回结果（是以REST ful 风格返回的）：

// 警告信息：不支持在文档索引请求中指定类型
// 而是使用无类型的端点(/{index}/_doc/{id}， /{index}/_doc，或
/{index}/_create/{id})。
{
"_index" : "test1", // 索引
"_type" : "type1", // 类型
"_id" : "1", // id
"_version" : 1, // 版本
"result" : "created", // 操作类型
"_shards" : { // 分片信息
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

那么 name 这个字段用不用指定类型呢。毕竟我们关系型数据库是需要指定类型的啊 !
字符串类型
text 、 keyword
数值类型
long, integer, short, byte, double, float, half_float, scaled_float
日期类型
date
te布尔值类型
boolean
二进制类型
binary
等等…
4、指定字段类

PUT /test2
{
    "mappings":{
        "properties":{
            "name":{
                "type":"text"
            },
            "age":{
                "type":"long"
            },
            "birthday":{
                "type":"date"
            }
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

输出：

{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "test2"
}
1
2
3
4
5

5、查看一下索引字段

GET test2

{
    "test2":{
        "aliases":{

        },
        "mappings":{
            "properties":{
                "age":{
                    "type":"long"
                },
                "birthday":{
                    "type":"date"
                },
                "name":{
                    "type":"text"
                }
            }
        },
        "settings":{
            "index":{
                "creation_date":"1585384302712",
                "number_of_shards":"1",
                "number_of_replicas":"1",
                "uuid":"71TUZ84wRTW5P8lKeN4I4Q",
                "version":{
                    "created":"7060199"
                },
                "provided_name":"test2"
            }
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

查看一下test3索引：

GET test3

返回结果：

{
    "test3":{
        "aliases":{

        },
        "mappings":{
            "properties":{
                "age":{
                    "type":"long"
                },
                "birth":{
                    "type":"date"
                },
                "name":{
                    "type":"text",
                    "fields":{
                        "keyword":{
                            "type":"keyword",
                            "ignore_above":256
                        }
                    }
                }
            }
        },
        "settings":{
            "index":{
                "creation_date":"1585384497051",
                "number_of_shards":"1",
                "number_of_replicas":"1",
                "uuid":"xESBKF1XTpCAZOgMqBNUbA",
                "version":{
                    "created":"7060199"
                },
                "provided_name":"test3"
            }
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

我们看上列没有给字段指定类型那么es就会默认给我配置字段类型！
对比关系型数据库：
PUT test1/type1/1 ：索引test1相当于关系型数据库的库，类型type1就相当于表，1 代表数据中的主
键 id
这里需要补充的是，在elastisearch5版本前，一个索引下可以创建多个类型，但是在elastisearch5后，
一个索引只能对应一个类型，而id相当于关系型数据库的主键id若果不指定就会默认生成一个20位的
uuid，属性相当关系型数据库的column(列)。
而结果中的 result 则是操作类型，现在是 created ，表示第一次创建。如果再次点击执行该命令那么
result 则会是 updated ，我们细心则会发现 _version 开始是1，现在你每点击一次就会增加一次。表示
第几次更改。
7、我们在来学一条命令 (elasticsearch 中的索引的情况) ：

GET _cat/indices?v
1

返回结果：查看我们所有索引的状态健康情况分片，数据储存大小等等。

8、那么怎么删除一条索引呢(库)呢?

DELETE /test1
1

{
"acknowledged" : true # 表示删除成功！
}
1
2
3

增删改查命令

第一条数据：

PUT /kuangshen/user/1
{
"name":"狂神说",
"age":18,
"desc":"一顿操作猛如虎，一看工资2500",
"tags":["直男","技术宅","温暖"]
}
1
2
3
4
5
6
7

第二条数据 :

PUT /kuangshen/user/2
{
"name":"张三",
"age":3,
"desc":"法外狂徒",
"tags":["渣男","旅游","交友"]
}
1
2
3
4
5
6
7

第三条数据：

PUT /kuangshen/user/3
{
"name":"李四",
"age":30,
"desc":"mmp，不知道怎么形容",
"tags":["靓女","旅游","唱歌"]
}
1
2
3
4
5
6
7

查看下数据：

注意⚠ ：当执行命令时，如果数据不存在，则新增该条数据，如果数据存在则修改该条数据。
咱们通过 GET 命令查询一下 :

GET kuangshen/user/1
1

返回结果：

{
    "_index":"kuangshen",
    "_type":"user",
    "_id":"1",
    "_version":1,
    "_seq_no":0,
    "_primary_term":1,
    "found":true,
    "_source":{
        "name":"狂神说",
        "age":18,
        "desc":"一顿操作猛如虎，一看工资2500",
        "tags":[
            "直男",
            "技术宅",
            "温暖"
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

如果你想更新数据可以覆盖这条数据 :

PUT /kuangshen/user/1
{
"name":"狂神说Java",
"age":18,
"desc":"一顿操作猛如虎，一看工资2.5",
"tags":["直男","技术宅","温暖"]
}
1
2
3
4
5
6
7

返回结果：

{
"_index" : "kuangshen",
"_type" : "user",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

已经修改了那么 PUT 可以更新数据但是。麻烦的是原数据你还要重写一遍要这不符合我们规矩。

更新数据POST

我们使用 POST 命令，在 id 后面跟 _update ，要修改的内容放到 doc 文档(属性)中即可。

POST /kuangshen/user/1/_update
{
"doc":{
"name":"狂神说Java",
"desc":"关注狂神公众号每日更新文章哦"
}
}
1
2
3
4
5
6
7

返回结果：

{
    "_index":"kuangshen",
    "_type":"user",
    "_id":"1",
    "_version":3,
    "result":"updated",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "_seq_no":4,
    "_primary_term":1
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

条件查询_search?q=

简单的查询，我们上面已经不知不觉的使用熟悉了：

GET kuangshen/user/1
1

我们来学习下条件查询 _search?q=

GET kuangshen/user/_search?q=name:狂神说
1

通过 _serarch?q=name:狂神说查询条件是name属性有狂神说的那些数据。
别忘了 _search 和 from 属性中间的分隔符 ? 。

返回结果：

{
    "took":16,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":1,
            "relation":"eq"
        },
        "max_score":1.4229509,
        "hits":[
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"1",
                "_score":1.4229509,
                "_source":{
                    "name":"狂神说Java",
                    "age":18,
                    "desc":"关注狂神公众号每日更新文章哦",
                    "tags":[
                        "直男",
                        "技术宅",
                        "温暖"
                    ]
                }
            }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

我们看一下结果返回并不是数据本身，是给我们了一个 hits ，还有 _score得分，就是根据算法算出和
查询条件匹配度高得分就搞。

构建查询

GET kuangshen/user/_search
{
"query":{
"match":{
"name": "狂神"
}
}
}
1
2
3
4
5
6
7
8

上例，查询条件是一步步构建出来的，将查询条件添加到 match 中即可。返回结果还是一样的：

{
    "took":0,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":1,
            "relation":"eq"
        },
        "max_score":1.6285465,
        "hits":[
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"1",
                "_score":1.6285465,
                "_source":{
                    "name":"狂神说Java",
                    "age":18,
                    "desc":"关注狂神公众号每日更新文章哦",
                    "tags":[
                        "直男",
                        "技术宅",
                        "温暖"
                    ]
                }
            }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

除此之外，我们还可以查询全部:

GET kuangshen/user/_search #这是一个查询但是没有条件
GET kuangshen/user/_search
{
"query":{
"match_all": {}
}
}
1
2
3
4
5
6
7

match_all的值为空，表示没有查询条件，就像select * from table_name一样。

返回结果：全部查询出来了！
如果有个需求，我们仅是需要查看 name 和 desc 两个属性，其他的不要怎么办?

GET kuangshen/user/_search
{
"query":{
"match_all": {}
},
"_source": ["name","desc"]
}
1
2
3
4
5
6
7

如上例所示，在查询中，通过 _source 来控制仅返回 name 和 age 属性。

{
    "took":1,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":3,
            "relation":"eq"
        },
        "max_score":1,
        "hits":[
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"2",
                "_score":1,
                "_source":{
                    "name":"张三",
                    "desc":"法外狂徒"
                }
            },
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"3",
                "_score":1,
                "_source":{
                    "name":"李四",
                    "desc":"mmp，不知道怎么形容"
                }
            },
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"1",
                "_score":1,
                "_source":{
                    "name":"狂神说Java",
                    "desc":"关注狂神公众号每日更新文章哦"
                }
            }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

一般的，我们推荐使用构建查询，以后在与程序交互时的查询等也是使用构建查询方式处理查询条件，
因为该方式可以构建更加复杂的查询条件，也更加一目了然

排序查询

我们说到排序有人就会想到：正序或倒序那么我们先来倒序：

GET kuangshen/user/_search
{
"query":{
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
1
2
3
4
5
6
7
8
9
10
11
12
13

上例，在条件查询的基础上，我们又通过 sort 来做排序，排序对象是 age ， order 是 desc 降序。

{
    "took":0,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":3,
            "relation":"eq"
        },
        "max_score":null,
        "hits":[
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"3",
                "_score":null,
                "_source":{
                    "name":"李四",
                    "age":30,
                    "desc":"mmp，不知道怎么形容",
                    "tags":[
                        "靓女",
                        "旅游",
                        "唱歌"
                    ]
                },
                "sort":[
                    30
                ]
            },
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"1",
                "_score":null,
                "_source":{
                    "name":"狂神说Java",
                    "age":18,
                    "desc":"关注狂神公众号每日更新文章哦",
                    "tags":[
                        "直男",
                        "技术宅",
                        "温暖"
                    ]
                },
                "sort":[
                    18
                ]
            },
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"2",
                "_score":null,
                "_source":{
                    "name":"张三",
                    "age":3,
                    "desc":"法外狂徒",
                    "tags":[
                        "渣男",
                        "旅游",
                        "交友"
                    ]
                },
                "sort":[
                    3
                ]
            }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

正序，就是 desc 换成了 asc

GET kuangshen/user/_search
{
"query":{
"match_all": {}
},
"sort": [
{
"age": {
"order": "asc"
}
}
]
}
1
2
3
4
5
6
7
8
9
10
11
12
13

注意:在排序的过程中，只能使用可排序的属性进行排序。那么可以排序的属性有哪些呢?
数字
日期
ID
其他都不行！

分页查询

GET kuangshen/user/_search
{
"query":{
"match_all": {}
},
"sort": [
{
"age": {
"order": "asc"
}
}
],
"from": 0, # 从第n条开始
"size": 1 # 返回n条数据
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

返回结果：

{
    "took":0,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":3,
            "relation":"eq"
        },
        "max_score":null,
        "hits":[
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"2",
                "_score":null,
                "_source":{
                    "name":"张三",
                    "age":3,
                    "desc":"法外狂徒",
                    "tags":[
                        "渣男",
                        "旅游",
                        "交友"
                    ]
                },
                "sort":[
                    3
                ]
            }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

就返回了一条数据是从第0条开始的返回一条数据。可以再测试！
学到这里，我们也可以看到，我们的查询条件越来越多，开始仅是简单查询，慢慢增加条件查询，增加
排序，对返回结果进行限制。所以，我们可以说:对elasticsearch于来说，所有的查询条件都是可插拔
的，彼此之间用分割。比如说，我们在查询中，仅对返回结果进行限制:

GET kuangshen/user/_search
{
"query":{
"match_all": {}
},
"from": 0, # 从第n条开始
"size": 1 # 返回n条数据
}
1
2
3
4
5
6
7
8

布尔查询

先增加一个数据：

PUT /kuangshen/user/4
{
"name":"狂神说",
"age":3,
"desc":"一顿操作猛如虎，一看工资2500",
"tags":["直男","技术宅","温暖"]
}
1
2
3
4
5
6
7

must (and)

我要查询所有 name 属性为“ 狂神 “的数据，并且年龄为18岁的！

GET kuangshen/user/_search

{
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "name":"狂神说"
                    }
                },
                {
                    "match":{
                        "age":3
                    }
                }
            ]
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

我们通过在 bool 属性内使用 must 来作为查询条件！看结果，是不是有点像 and 的感觉，里面的条件
需要都满足！

should (or)
那么我要查询name为狂神或 age 为18 的呢？

GET kuangshen/user/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "狂神说"
}
},
{
"match": {
"age": 18
}
}
]
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

返回结果：

{
    "took":0,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":2,
            "relation":"eq"
        },
        "max_score":3.1522982,
        "hits":[
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"1",
                "_score":3.1522982,
                "_source":{
                    "name":"狂神说Java",
                    "age":18,
                    "desc":"关注狂神公众号每日更新文章哦",
                    "tags":[
                        "直男",
                        "技术宅",
                        "温暖"
                    ]
                }
            },
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"4",
                "_score":2.4708953,
                "_source":{
                    "name":"狂神说",
                    "age":3,
                    "desc":"一顿操作猛如虎，一看工资2500",
                    "tags":[
                        "直男",
                        "技术宅",
                        "温暖"
                    ]
                }
            }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

我们的返回结果是不是出现了一个 age : 3的。是不是有点像 or 呢
must_not (not)
我想要查询年龄不是 18 的数据

GET kuangshen/user/_search
{
    "query":{
        "bool":{
            "must_not":[
                {
                    "match":{
                        "age":18
                    }
                }
            ]
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Fitter
我要查询 name 为狂神的，age大于10的数据

GET kuangshen/user/_search
{
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "name":"狂神"
                    }
                }
            ],
            "filter":{
                "range":{
                    "age":{
                        "gt":10
                    }
                }
            }
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

这里就用到了 filter 条件过滤查询，过滤条件的范围用 range 表示， gt 表示大于，大于多少呢?是10。
其余操作如下 :
gt 表示大于
gte 表示大于等于
lt 表示小于
lte 表示小于等于
要查询 name 是狂神， age 在 25~30 之间的怎么查?

GET kuangshen/user/_search
{
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "name":"狂神"
                    }
                }
            ],
            "filter":{
                "range":{
                    "age":{
                        "gte":25,
                        "lte":30
                    }
                }
            }
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

短语检索

我要查询 tags为男的数据

GET kuangshen/user/_search
{
"query":{
"match": {
"tags": "男"
}
}
}
1
2
3
4
5
6
7
8

返回了所有标签中带男的记录！
既然按照标签检索，那么，能不能写多个标签呢?又该怎么写呢?

GET kuangshen/user/_search
{
"query":{
"match": {
"tags": "男 技术"
}
}
}
1
2
3
4
5
6
7
8

返回：只要含有这个标签满足一个就给我返回这个数据了。

term查询精确查询

term 查询是直接通过倒排索引指定的词条，也就是精确查找。

term和match的区别:
match是经过分析(analyer)的，也就是说，文档是先被分析器处理了，根据不同的分析器，分析出
的结果也会不同，在会根据分词结果进行匹配。
term是不经过分词的，直接去倒排索引查找精确的值。
注意 ⚠ ：我们现在用的es7版本所以我们用 mappings properties 去给多个字段(fields)指定类型的时
候,不能给我们的索引制定类型：

PUT testdb
{
    "mappings":{
        "properties":{
            "name":{
                "type":"text"
            },
            "desc":{
                "type":"keyword"
            }
        }
    }
}
// 插入数据
PUT testdb/_doc/1
{
    "name":"狂神说Java name",
    "desc":"狂神说Java desc"
}
PUT testdb/_doc/2
{
    "name":"狂神说Java name",
    "desc":"狂神说Java desc2"
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

上述中testdb索引中,字段name在被查询时会被分析器进行分析后匹配查询。而属于keyword类型不会
被分析器处理。
我们来验证一下：

GET _analyze
{
"analyzer": "keyword",
"text": "狂神说Java name"
}
1
2
3
4
5

结果：

{
    "tokens":[
        {
            "token":"狂神说Java name",
            "start_offset":0,
            "end_offset":12,
            "type":"word",
            "position":0
        }
    ]
}
1
2
3
4
5
6
7
8
9
10
11

是不是没有被分析啊。就是简单的一个字符串啊。再测试

GET _analyze
{
"analyzer": "standard",
"text": "狂神说Java name"
}
1
2
3
4
5

结果：

{
    "tokens":[
        {
            "token":"狂",
            "start_offset":0,
            "end_offset":1,
            "type":"<IDEOGRAPHIC>",
            "position":0
        },
        {
            "token":"神",
            "start_offset":1,
            "end_offset":2,
            "type":"<IDEOGRAPHIC>",
            "position":1
        },
        {
            "token":"说",
            "start_offset":2,
            "end_offset":3,
            "type":"<IDEOGRAPHIC>",
            "position":2
        },
        {
            "token":"java",
            "start_offset":3,
            "end_offset":7,
            "type":"<ALPHANUM>",
            "position":3
        },
        {
            "token":"name",
            "start_offset":8,
            "end_offset":12,
            "type":"<ALPHANUM>",
            "position":4
        }
    ]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

那么我们看一下们字符串是不是被分析了啊。
总结：keyword 字段类型不会被分析器分析！
现在我们来查询一下：

GET testdb/_search // text 会被分析器分析 查询
{
    "query":{
        "term":{
            "name":"狂"
        }
    }
}
GET testdb/_search // keyword 不会被分析所以直接查询
{
    "query":{
        "match":{
            "desc":"狂神说Java desc"
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

查找多个精确值(terms)
官网地址：https://www.elastic.co/guide/cn/elasticsearch/guide/current/_finding_multiple_exact_va
lues.html

PUT testdb/_doc/3
{
"t1": "22",
"t2": "2020-4-16"
}
PUT testdb/_doc/4
{
"t1": "33",
"t2": "2020-4-17"
}
# 查询 精确查找多个值
GET testdb/_search
{
    "query":{
        "bool":{
            "should":[
                {
                    "term":{
                        "t1":"22"
                    }
                },
                {
                    "term":{
                        "t1":"33"
                    }
                }
            ]
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

除了bool查询之外：

GET testdb/_doc/_search
{
"query": {
"terms": {
"t1": ["22", "33"]
}
}
}
1
2
3
4
5
6
7
8

高亮显示

GET kuangshen/user/_search
{
"query":{
"match": {
"name": "狂神"
}
},
"highlight" :{
"fields": {
"name":{}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13

返回结果：

#! Deprecation: [types removal] Specifying types in search requests is
deprecated.
1
2

我们可以看到已 狂神经帮我们加上了一个标签
这是es帮我们加的标签。那我·也可以自己自定义样式

GET kuangshen/user/_search

{
    "took":62,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":{
            "value":2,
            "relation":"eq"
        },
        "max_score":1.6472635,
        "hits":[
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"4",
                "_score":1.6472635,
                "_source":{
                    "name":"狂神说",
                    "age":3,
                    "desc":"一顿操作猛如虎，一看工资2500",
                    "tags":[
                        "直男",
                        "技术宅",
                        "温暖"
                    ]
                },
                "highlight":{
                    "name":[
                        "<em>狂</em><em>神</em>说"
                    ]
                }
            },
            {
                "_index":"kuangshen",
                "_type":"user",
                "_id":"1",
                "_score":1.4348655,
                "_source":{
                    "name":"狂神说Java",
                    "age":18,
                    "desc":"关注狂神公众号每日更新文章哦",
                    "tags":[
                        "直男",
                        "技术宅",
                        "温暖"
                    ]
                },
                "highlight":{
                    "name":[
                        "<em>狂</em><em>神</em>说Java"
                    ]
                }
            }
        ]
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

我们可以看到已 狂神经帮我们加上了一个标签
这是es帮我们加的标签。那我·也可以自己自定义样式

GET kuangshen/user/_search
{
"query":{
"match": {
"name": "狂神"
}
},
"highlight" :{
"pre_tags": "<b class='key' style='color:red'>",
"post_tags": "</b>",
"fields": {
"name":{}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

需要注意的是:自定义标签中属性或样式中的逗号一律用英文状态的单引号表示，应该与外部 es 语法的
双引号区分开。

说明：Deprecation

注意 elasticsearch 在第一个版本的开始每个文档都储存在一个索引中，并分配一个映射类型，映射类
型用于表示被索引的文档或者实体的类型，这样带来了一些问题, 导致后来在 elasticsearch6.0.0 版本中
一个文档只能包含一个映射类型，而在 7.0.0 中，映射类型则将被弃用，到了 8.0.0 中则将完全被删
除。
只要记得，一个索引下面只能创建一个类型就行了，其中各字段都具有唯一性，如果在创建映射的时
候，如果没有指定文档类型，那么该索引的默认索引类型是 _doc ，不指定文档id则会内部帮我们生
成一个id字符串。

API创建索引及文档

找文档

网上的es教程大都十分老旧，而且es的版本众多，个别版本的差异还较大，另外es本身提供多种api，导
致许多文章各种乱七八糟实例！所以后面直接放弃，从官网寻找方案，这里我使用elasticsearch最新的
7.6.1版本来讲解：
1、进入es的官网指导文档 https://www.elastic.co/guide/index.html
2、找到 Elasticsearch Clients（这个就是客户端api文档）

3、我们使用java rest风格api，大家可以更加自己的版本选择特定的other versions。

4、rest又分为high level和low level，我们直接选择high level下面的 Getting started

5、向下阅读找到Maven依赖和基本配置！

Java REST Client 说明

Java REST Client 有两种风格：
Java Low Level REST Client ：用于Elasticsearch的官方低级客户端。它允许通过http与Elasticsearch
集群通信。将请求编排和响应反编排留给用户自己处理。它兼容所有的Elasticsearch版本。（PS：学过
WebService的话，对编排与反编排这个概念应该不陌生。可以理解为对请求参数的封装，以及对响应结
果的解析）
Java High Level REST Client ：用于Elasticsearch的官方高级客户端。它是基于低级客户端的，它提供
很多API，并负责请求的编排与响应的反编排。（PS：就好比是，一个是传自己拼接好的字符串，并且
自己解析返回的结果；而另一个是传对象，返回的结果也已经封装好了，直接是对象，更加规范了参数
的名称以及格式，更加面对对象一点）
（PS：所谓低级与高级，我觉得一个很形象的比喻是，面向过程编程与面向对象编程）

网上很多教程比较老旧，都是使用TransportClient操作的，在 Elasticsearch 7.0 中不建议使用
TransportClient，并且在8.0中会完全删除TransportClient。因此，官方更建议我们用Java High Level
REST Client，它执行HTTP请求，而不是序列号的Java请求。既然如此，这里我们就直接用高级了。

配置基本项目依赖

1、新建一个springboot（2.2.5版）项目 kuang-elasticsearch ，导入web依赖即可！
2、配置es的依赖！

<properties>
<java.version>1.8</java.version>
<!-- 这里SpringBoot默认配置的版本不匹配，我们需要自己配置版本！ -->
<elasticsearch.version>7.6.1</elasticsearch.version>
</properties>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
1
2
3
4
5
6
7
8
9

3、继续阅读文档到Initialization ，我们看到需要构建RestHighLevelClient对象；

RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http"))); // 构建客户端对象
// 操作....
// 高级客户端内部会创建低级客户端用于基于提供的builder执行请求。低级客户端维护一个连接池，
并启动一些线程，因此当你用完以后应该关闭高级客户端，并且在内部它将会关闭低级客户端，以释放这
些资源。关闭客户端可以使用close()方法：
client.close(); // 关闭
1
2
3
4
5
6
7
8
9

4、我们编写一个配置类，提供这个bean来进行操作

package com.llp.elasticsearch.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class RestHighLevelClientConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")));
        return client;
    }

}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

APIs 测试

package com.llp.elasticsearch;

import com.alibaba.fastjson.JSON;
import com.llp.elasticsearch.entity.User;
import org.assertj.core.util.Lists;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.List;
import java.util.concurrent.TimeUnit;

@SpringBootTest
class ElasticsearchApplicationTests {

    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;

    /**
     * 创建索引
     * @throws IOException
     */
    @Test
    void testCreateIndex() throws IOException {
        CreateIndexRequest request = new CreateIndexRequest("llp_index");
        CreateIndexResponse createIndexResponse
                =client.indices().create(request, RequestOptions.DEFAULT);
        System.out.println(createIndexResponse);
    }

    /**
     * 获取索引
     * @throws IOException
     */
    @Test
    void testGetIndex() throws IOException {
        GetIndexRequest request = new GetIndexRequest("llp_index");
        GetIndexResponse getIndexResponse = client.indices().get(request,RequestOptions.DEFAULT);
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(exists);
        System.out.println(getIndexResponse);
    }

    /**
     * 删除索引
     * @throws IOException
     */
    @Test
    void deleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("llp_index");
        AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
        System.out.println(delete.isAcknowledged());
        System.out.println(delete);
    }

    //创建文档
    @Test
    void testAddDocument() throws IOException {
        //创建对象
        User user = new User("狂神说",3);
        //创建请求
        IndexRequest request = new IndexRequest("llp_index");
        request.id("1");
        request.timeout(TimeValue.timeValueSeconds(2));
        request.source(JSON.toJSONString(user), XContentType.JSON);
        IndexResponse indexResponse = client.index(request,RequestOptions.DEFAULT);
        System.out.println(indexResponse.toString());
        System.out.println(indexResponse.status());
    }

    // 判断此id是否存在这个索引库中
    @Test
    void testIsExists() throws IOException {
        GetRequest getRequest = new GetRequest("llp_index","1");
        getRequest.fetchSourceContext(new FetchSourceContext(false));
        getRequest.storedFields("_none_");
        boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
        System.out.println(exists);
    }

    //获取文档记录
    @Test
    void testGetDocument() throws IOException {
        GetRequest getRequest = new GetRequest("llp_index","1");
        GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
        System.out.println(getResponse.isExists());
        System.out.println(getResponse.getSourceAsString());
        System.out.println(getResponse);
    }

    //更新文档记录
    @Test
    void testUpdateDocument() throws IOException {
        UpdateRequest request = new UpdateRequest("llp_index","1");
        request.timeout(TimeValue.timeValueSeconds(2));
        User user = new User("孙悟空",28);
        request.doc(JSON.toJSONString(user),XContentType.JSON);
        UpdateResponse updateResponse = client.update(request,RequestOptions.DEFAULT);
        System.out.println(updateResponse.status());
    }

    //删除文档记录
    @Test
    void testDelete() throws IOException {
        DeleteRequest request = new DeleteRequest("llp_index","1");
        request.timeout(TimeValue.timeValueSeconds(2));
        DeleteResponse delete = client.delete(request, RequestOptions.DEFAULT);
        System.out.println(delete.status());
    }

    //批量添加数据
    @Test
    void testBulkRequest() throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout(TimeValue.timeValueMinutes(2));
        List<User> users = Lists.newArrayList();
        users.add(new User("孙悟空",1));
        users.add(new User("猪八戒",1));
        users.add(new User("沙悟净",1));
        users.add(new User("唐僧",1));
        users.add(new User("白骨精",1));
        users.add(new User("蜘蛛精",1));
        for (int i = 0; i < users.size(); i++) {
            bulkRequest.add(new IndexRequest("llp_index").id(""+(i+1))
                    .source(JSON.toJSONString(users.get(i)),XContentType.JSON));
        }
        BulkResponse bulk = client.bulk(bulkRequest, RequestOptions.DEFAULT);
        //是否添加失败
        System.out.println(bulk.hasFailures());
    }

    //查询测试
    @Test
    void testSearch() throws IOException {
        SearchRequest request = new SearchRequest("llp_index");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "孙悟空");
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", "孙");
//        MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
        searchSourceBuilder.query(matchQueryBuilder);
        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
        request.source(searchSourceBuilder);
        SearchResponse searchResponse = client.search(request, RequestOptions.DEFAULT);
        System.out.println(JSON.toJSONString(searchResponse.getHits()));
        for (SearchHit documentFields : searchResponse.getHits()) {
            System.out.println(documentFields.getSourceAsMap());
        }
    }




}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184

实体类

package com.llp.elasticsearch.entity;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.stereotype.Component;

@Data
@AllArgsConstructor
@NoArgsConstructor
@Component
public class User {
    private String name;
    private int age;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

实战测试

初始化项目

1、启动es服务和客户端
2、使用springboot快速构建服务

3、修改版本依赖！

<properties>
<java.version>1.8</java.version>
<!-- 这里SpringBoot默认配置的版本不匹配，我们需要自己配置版本！ -->
<elasticsearch.version>7.6.1</elasticsearch.version>
</properties>

1
2
3
4
5
6

4、配置 application.properties 文件

server.port=9090
# 关闭thymeleaf缓存
spring.thymeleaf.cache=false
1
2
3

5、导入前端的素材！修改为Thymeleaf支持的格式！

<html xmlns:th="http://www.thymeleaf.org">
1

6、编写IndexController进行跳转测试！

jsoup讲解

1、导入jsoup的依赖

<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
1
2
3
4
5

2、编写一个工具类 HtmlParseUtil

package com.llp.elasticsearchjd.utils;

import com.llp.elasticsearchjd.pojo.Content;
import org.apache.commons.codec.Encoder;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

@Component
public class HtmlParseUtil {
    public static void main(String[] args) throws IOException {
    // jsoup不能抓取ajax的请求，除非自己模拟浏览器进行请求！
    // 1、https://search.jd.com/Search?keyword=java
        String url = "https://search.jd.com/Search?keyword=java";
    // 2、解析网页（需要联网）
        Document document = Jsoup.parse(new URL(url), 30000);

        // 3、抓取搜索到的数据！
    // Document 就是我们JS的Document对象，你可以看到很多JS语法
        Element element = document.getElementById("J_goodsList");
    // 4、找到所有的li元素
        Elements elements = element.getElementsByTag("li");
    // 获取京东的商品信息
        for (Element el : elements) {
    // 这种网站，一般为了保证效率，一般会延时加载图片
            //https://img13.360buyimg.com/n1/s200x200_jfs/t1/178411/10/177/290041/607f7540E3f115804/5bb57caab13b6340.jpg
    // String img = el.getElementsByTag("img").eq(0).attr("src");
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = el.getElementsByClass("p-price").eq(0).text();
            String title = el.getElementsByClass("p-name").eq(0).text();
            System.out.println(img);
            System.out.println(price);
            System.out.println(title);
            System.out.println("================================");
        }
    }

    public List<Content> parseJD(String keyword) throws Exception {
        // jsoup不能抓取ajax的请求，除非自己模拟浏览器进行请求！
        // 1、https://search.jd.com/Search?keyword=java
        String url = "https://search.jd.com/Search?keyword="+keyword;
        // 2、解析网页（需要联网）
        Document document = Jsoup.parse(new URL(url), 30000);
        // 3、抓取搜索到的数据！
        // Document 就是我们JS的Document对象，你可以看到很多JS语法
        Element element = document.getElementById("J_goodsList");
        // 4、找到所有的li元素
        Elements elements = element.getElementsByTag("li");

        List<Content> contents = new ArrayList<>();
        // 获取京东的商品信息
        for (Element el : elements) {
            // 这种网站，一般为了保证效率，一般会延时加载图片
            // String img = el.getElementsByTag("img").eq(0).attr("src");
            String img = "https:"+el.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String title = el.getElementsByClass("p-name").eq(0).text();
            String price = el.getElementsByClass("p-price").eq(0).text();
            contents.add(new Content(img,title,price));
        }
        return contents;
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

3、封装一个实体类保存爬取下来的数据

package com.llp.elasticsearchjd.pojo;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Content {
    private String img;
    private String title;
    private String price;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

业务编写

1、导入ElasticsearchClientConfig 配置类

package com.llp.elasticsearchjd.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ElasticsearchClientConfig {
    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")));
        return client;
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

2、编写业务！

package com.llp.elasticsearchjd.service;

import com.alibaba.fastjson.JSON;
import com.llp.elasticsearchjd.pojo.Content;
import com.llp.elasticsearchjd.utils.HtmlParseUtil;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

@Service
public class ContentService {
    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;

    public Boolean parseContent(String keyword) throws Exception {
        //获取jd_goods索引
        GetIndexRequest getIndexRequest = new GetIndexRequest("jd_goods");
        boolean exists = client.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
        //判断jd_goods索引是否存在
        if(!exists){
            //不存在则创建索引
            CreateIndexRequest createIndexRequest = new CreateIndexRequest("jd_goods");
            client.indices().create(createIndexRequest, RequestOptions.DEFAULT);
        }
        // 解析查询出来的数据
        List<Content> contents = new HtmlParseUtil().parseJD(keyword);
        // 封装数据到索引库中！
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout(TimeValue.timeValueMinutes(2));
        //bulkRequest.timeout("2m");
        for (int i = 0; i < contents.size(); i++) {
            bulkRequest.add(new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
        }
        BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulkResponse.hasFailures();
    }

    public List<Map<String, Object>> searchContentPage(String keyword, int pageNo, int pageSize) throws IOException {
        // 基本的参数判断！
        if (pageNo <= 1) {
            pageNo = 1;
        }
        //获取jd_goods索引
        GetIndexRequest getIndexRequest = new GetIndexRequest("jd_goods");
        boolean exists = client.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
        //判断jd_goods索引是否存在
        if(!exists){
            //不存在则创建索引
            CreateIndexRequest createIndexRequest = new CreateIndexRequest("jd_goods");
            client.indices().create(createIndexRequest, RequestOptions.DEFAULT);
        }
        // 基本的条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_goods");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        // 分页
        sourceBuilder.from(pageNo);
        sourceBuilder.size(pageSize);
        // 精准匹配 QueryBuilders 根据自己要求配置查询条件即可！
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",
                keyword);
        sourceBuilder.query(termQueryBuilder);
        sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
        // 搜索
        searchRequest.source(sourceBuilder);
        SearchResponse response = client.search(searchRequest,
                RequestOptions.DEFAULT);
        // 解析结果！
        List<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit documentFields : response.getHits().getHits()) {
            list.add(documentFields.getSourceAsMap());
        }
        return list;
    }

    public List<Map<String, Object>> highlightSearch(String keyword, int pageNo, int pageSize) throws IOException {
        // 基本的参数判断！
        if (pageNo <= 1) {
            pageNo = 1;
        }
        //获取jd_goods索引
        GetIndexRequest getIndexRequest = new GetIndexRequest("jd_goods");
        boolean exists = client.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
        //判断jd_goods索引是否存在
        if(!exists){
            //不存在则创建索引
            CreateIndexRequest createIndexRequest = new CreateIndexRequest("jd_goods");
            client.indices().create(createIndexRequest, RequestOptions.DEFAULT);
        }
        // 基本的条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_goods");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        // 分页
        sourceBuilder.from(pageNo);
        sourceBuilder.size(pageSize);
        // 精准匹配 QueryBuilders 根据自己要求配置查询条件即可！
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",
                keyword);
        sourceBuilder.query(termQueryBuilder);

        //高亮构建！
        HighlightBuilder highlightBuilder = new HighlightBuilder();//生成高亮查询器
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false); //如果要多个字段高亮,这项要为false
        highlightBuilder.preTags("<span style=\"color:red\">"); //高亮设置
        highlightBuilder.postTags("</span>");
        sourceBuilder.highlighter(highlightBuilder);

        sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
        // 搜索
        searchRequest.source(sourceBuilder);
        SearchResponse response = client.search(searchRequest,
                RequestOptions.DEFAULT);
        // 解析结果！
        List<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit hit : response.getHits().getHits()) {
            //获取高亮字段
            Map<String, HighlightField> highlightFields =
                    hit.getHighlightFields();
            HighlightField titleField = highlightFields.get("title");
            Map<String, Object> source = hit.getSourceAsMap();
            //千万记得要记得判断是不是为空,不然你匹配的第一个结果没有高亮内容,那么就会报空指针异常,这个错误一开始真的搞了很久
            if(titleField!=null){
                Text[] fragments = titleField.fragments();
                String name = "";
                for (Text text : fragments) {
                    name += text;
                }
                source.put("title", name); //高亮字段替换掉原本的内容
            }
            list.add(source);
        }
        return list;
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163

3、controller

package com.llp.elasticsearchjd.controller;

import com.llp.elasticsearchjd.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

import java.util.List;
import java.util.Map;

@RestController
public class ContentController {
    @Autowired
    private ContentService contentService;

    @GetMapping("/parse/{keyword}")
    public Boolean parse(@PathVariable("keyword") String keyword) throws Exception {
        return contentService.parseContent(keyword);
    }

    //http://localhost:9090/search/java/1/10
    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List<Map<String, Object>> search(@PathVariable("keyword") String keyword,
                                            @PathVariable("pageNo") int pageNo,
                                            @PathVariable("pageSize") int pageSize) throws Exception {
        return contentService.searchContentPage(keyword, pageNo, pageSize);
    }

    @GetMapping("/highlight/search/{keyword}/{pageNo}/{pageSize}")
    public List<Map<String, Object>> highlightSearch(@PathVariable("keyword") String keyword,
                                            @PathVariable("pageNo") int pageNo,
                                            @PathVariable("pageSize") int pageSize) throws Exception {
        return contentService.highlightSearch(keyword, pageNo, pageSize);
    }

}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

前端逻辑、搜索高亮

1、定义导入vue和axios的依赖！

<script th:src="@{/js/axios.js}"></script>
<script th:src="@{/js/vue.min.js}"></script>
1
2

2、初始化Vue对象，给外层div绑定app对象！

<script>
new Vue({
el: '#app',
data: {
keyword: '', // 搜索关键字
results: [] // 搜索的结果
}
})
</script>
1
2
3
4
5
6
7
8
9

3、绑定搜索框及相关事件！

4、编写方法，获取后端传递的数据！

<script>
    new Vue({
        el: '#app',
        data: {
            keyword: '',
            results: []
        },
        methods: {
            searchKey(){
                var keyword = this.keyword;
                console.log(keyword);
                axios.get('/highlight/search/'+keyword+"/1/10").then(response=>{
                    console.log(response);
                    this.results = response.data;
                });
            }
        }
    })
</script>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

5、渲染解析回来的数据！

3、前端vue指令解析html！

<!--标题-->
<p class="productTitle">
<a v-html="result.title"> </a>
</p>
1
2
3
4

4、最终效果！

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/IT小白/article/detail/543474?site