赞
踩
搜索引擎服务使用 ElasticSearch
提供的对外 web 服务选则 Springboot web
1.1 ElasticSearch
Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。Elasticsearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。
官方客户端在Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示,Elasticsearch是最受欢迎的企业搜索引擎,其次是Apache Solr,也是基于Lucene。
现在开源的搜索引擎在市面上最常见的就是ElasticSearch和Solr,二者都是基于Lucene的实现,其中ElasticSearch相对更加重量级,在分布式环境表现也更好,二者的选则需考虑具体的业务场景和数据量级。对于数据量不大的情况下,完全需要使用像Lucene这样的搜索引擎服务,通过关系型数据库检索即可。
1.2 Spring Boot
现在 Spring Boot 在做 web 开发上是绝对的主流,其不仅仅是开发上的优势,在布署,运维各个方面都有着非常不错的表现,并且 Spring 生态圈的影响力太大了,可以找到各种成熟的解决方案。
1.3 ik分词器
ElasticSearch 本身不支持中文的分词,需要安装中文分词插件,如果需要做中文的信息检索,中文分词是基础,此处选则了ik,下载好后放入 elasticSearch 的安装位置的 plugin 目录即可。
需要安装好elastiSearch以及kibana(可选),并且需要lk分词插件。
1、安装elasticSearch elasticsearch官网. 笔者使用的是7.5.1。
2、ik插件下载 ik插件github地址. 注意下载和你下载elasticsearch版本一样的ik插件。
3、将ik插件放入elasticsearch安装目录下的plugins包下,新建报名ik,将下载好的插件解压到该目录下即可,启动es的时候会自动加载该插件。
1、获取数据使用ik分词插件
2、将数据存储在es引擎中
3、通过es检索方式对存储的数据进行检索
4、使用es的java客户端提供外部服务
5.1 全文检索的实现对象
按照博文的基本信息定义了如下实体类,主要需要知道每一个博文的url,通过检索出来的文章具体查看要跳转到该url。
package com.lbh.es.entity; import com.fasterxml.jackson.annotation.JsonIgnore; import javax.persistence.*; /** * PUT articles * { * "mappings": * {"properties":{ * "author":{"type":"text"}, * "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}, * "title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}, * "createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}, * "url":{"type":"text"} * } }, * "settings":{ * "index":{ * "number_of_shards":1, * "number_of_replicas":2 * } * } * } * --------------------------------------------------------------------------------------------------------------------- * Copyright(c)lbhbinhao@163.com * @author liubinhao * @date 2021/3/3 */ @Entity @Table(name = "es_article") public class ArticleEntity { @Id @JsonIgnore @GeneratedValue(strategy = GenerationType.IDENTITY) private long id; @Column(name = "author") private String author; @Column(name = "content",columnDefinition="TEXT") private String content; @Column(name = "title") private String title; @Column(name = "createDate") private String createDate; @Column(name = "url") private String url; public String getAuthor() { return author; } public void setAuthor(String author) { this.author = author; } public String getContent() { return content; } public void setContent(String content) { this.content = content; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getCreateDate() { return createDate; } public void setCreateDate(String createDate) { this.createDate = createDate; } public String getUrl() { return url; } public void setUrl(String url) { this.url = url; } }
5.2 客户端配置
通过java配置es的客户端
/** * Copyright(c)lbhbinhao@163.com * @author liubinhao * @date 2021/3/3 */ @Configuration public class EsConfig { @Value("${elasticsearch.schema}") private String schema; @Value("${elasticsearch.address}") private String address; @Value("${elasticsearch.connectTimeout}") private int connectTimeout; @Value("${elasticsearch.socketTimeout}") private int socketTimeout; @Value("${elasticsearch.connectionRequestTimeout}") private int tryConnTimeout; @Value("${elasticsearch.maxConnectNum}") private int maxConnNum; @Value("${elasticsearch.maxConnectPerRoute}") private int maxConnectPerRoute; @Bean public RestHighLevelClient restHighLevelClient() { // 拆分地址 List<HttpHost> hostLists = new ArrayList<>(); String[] hostList = address.split(","); for (String addr : hostList) { String host = addr.split(":")[0]; String port = addr.split(":")[1]; hostLists.add(new HttpHost(host, Integer.parseInt(port), schema)); } // 转换成 HttpHost 数组 HttpHost[] httpHost = hostLists.toArray(new HttpHost[]{}); // 构建连接对象 RestClientBuilder builder = RestClient.builder(httpHost); // 异步连接延时配置 builder.setRequestConfigCallback(requestConfigBuilder -> { requestConfigBuilder.setConnectTimeout(connectTimeout); requestConfigBuilder.setSocketTimeout(socketTimeout); requestConfigBuilder.setConnectionRequestTimeout(tryConnTimeout); return requestConfigBuilder; }); // 异步连接数配置 builder.setHttpClientConfigCallback(httpClientBuilder -> { httpClientBuilder.setMaxConnTotal(maxConnNum); httpClientBuilder.setMaxConnPerRoute(maxConnectPerRoute); return httpClientBuilder; }); return new RestHighLevelClient(builder); } }
5.3 业务代码编写
包括一些检索文章的信息,可以从文章标题,文章内容以及作者信息这些维度来查看相关信息。
/** * Copyright(c)lbhbinhao@163.com * @author liubinhao * @date 2021/3/3 */ @Service public class ArticleService { private static final String ARTICLE_INDEX = "article"; @Resource private RestHighLevelClient client; @Resource private ArticleRepository articleRepository; public boolean createIndexOfArticle(){ Settings settings = Settings.builder() .put("index.number_of_shards", 1) .put("index.number_of_replicas", 1) .build(); // {"properties":{"author":{"type":"text"}, // "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"} // ,"title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}, // ,"createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"} // } String mapping = "{\"properties\":{\"author\":{\"type\":\"text\"},\n" + "\"content\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" + ",\"title\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" + ",\"createDate\":{\"type\":\"date\",\"format\":\"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd\"}\n" + "},\"url\":{\"type\":\"text\"}\n" + "}"; CreateIndexRequest indexRequest = new CreateIndexRequest(ARTICLE_INDEX) .settings(settings).mapping(mapping,XContentType.JSON); CreateIndexResponse response = null; try { response = client.indices().create(indexRequest, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); } if (response!=null) { System.err.println(response.isAcknowledged() ? "success" : "default"); return response.isAcknowledged(); } else { return false; } } public boolean deleteArticle(){ DeleteIndexRequest request = new DeleteIndexRequest(ARTICLE_INDEX); try { AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT); return response.isAcknowledged(); } catch (IOException e) { e.printStackTrace(); } return false; } public IndexResponse addArticle(ArticleEntity article){ Gson gson = new Gson(); String s = gson.toJson(article); //创建索引创建对象 IndexRequest indexRequest = new IndexRequest(ARTICLE_INDEX); //文档内容 indexRequest.source(s,XContentType.JSON); //通过client进行http的请求 IndexResponse re = null; try { re = client.index(indexRequest, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); } return re; } public void transferFromMysql(){ articleRepository.findAll().forEach(this::addArticle); } public List<ArticleEntity> queryByKey(String keyword){ SearchRequest request = new SearchRequest(); /* * 创建 搜索内容参数设置对象:SearchSourceBuilder * 相对于matchQuery,multiMatchQuery针对的是多个fi eld,也就是说,当multiMatchQuery中,fieldNames参数只有一个时,其作用与matchQuery相当; * 而当fieldNames有多个参数时,如field1和field2,那查询的结果中,要么field1中包含text,要么field2中包含text。 */ SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders .multiMatchQuery(keyword, "author","content","title")); request.source(searchSourceBuilder); List<ArticleEntity> result = new ArrayList<>(); try { SearchResponse search = client.search(request, RequestOptions.DEFAULT); for (SearchHit hit:search.getHits()){ Map<String, Object> map = hit.getSourceAsMap(); ArticleEntity item = new ArticleEntity(); item.setAuthor((String) map.get("author")); item.setContent((String) map.get("content")); item.setTitle((String) map.get("title")); item.setUrl((String) map.get("url")); result.add(item); } return result; } catch (IOException e) { e.printStackTrace(); } return null; } public ArticleEntity queryById(String indexId){ GetRequest request = new GetRequest(ARTICLE_INDEX, indexId); GetResponse response = null; try { response = client.get(request, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); } if (response!=null&&response.isExists()){ Gson gson = new Gson(); return gson.fromJson(response.getSourceAsString(),ArticleEntity.class); } return null; } }
5.4 对外接口
和使用springboot开发web程序相同。
/** * Copyright(c)lbhbinhao@163.com * @author liubinhao * @date 2021/3/3 */ @RestController @RequestMapping("article") public class ArticleController { @Resource private ArticleService articleService; @GetMapping("/create") public boolean create(){ return articleService.createIndexOfArticle(); } @GetMapping("/delete") public boolean delete() { return articleService.deleteArticle(); } @PostMapping("/add") public IndexResponse add(@RequestBody ArticleEntity article){ return articleService.addArticle(article); } @GetMapping("/fransfer") public String transfer(){ articleService.transferFromMysql(); return "successful"; } @GetMapping("/query") public List<ArticleEntity> query(String keyword){ return articleService.queryByKey(keyword); } }
5.5 页面
此处页面使用thymeleaf,主要原因是笔者真滴不会前端,只懂一丢丢简单的h5,就随便做了一个可以展示的页面。
搜索页面
<!DOCTYPE html> <html lang="en" xmlns:th="http://www.thymeleaf.org"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>YiyiDu</title> <!-- input:focus设定当输入框被点击时,出现蓝色外边框 text-indent: 11px;和padding-left: 11px;设定输入的字符的起始位置与左边框的距离 --> <style> input:focus { border: 2px solid rgb(62, 88, 206); } input { text-indent: 11px; padding-left: 11px; font-size: 16px; } </style> <!--input初始状态--> <style class="input/css"> .input { width: 33%; height: 45px; vertical-align: top; box-sizing: border-box; border: 2px solid rgb(207, 205, 205); border-right: 2px solid rgb(62, 88, 206); border-bottom-left-radius: 10px; border-top-left-radius: 10px; outline: none; margin: 0; display: inline-block; background: url(/static/img/camera.jpg) no-repeat 0 0; background-position: 565px 7px; background-size: 28px; padding-right: 49px; padding-top: 10px; padding-bottom: 10px; line-height: 16px; } </style> <!--button初始状态--> <style class="button/css"> .button { height: 45px; width: 130px; vertical-align: middle; text-indent: -8px; padding-left: -8px; background-color: rgb(62, 88, 206); color: white; font-size: 18px; outline: none; border: none; border-bottom-right-radius: 10px; border-top-right-radius: 10px; margin: 0; padding: 0; } </style> </head> <body> <!--包含table的div--> <!--包含input和button的div--> <div style="font-size: 0px;"> <div align="center" style="margin-top: 0px;"> <img src="../static/img/yyd.png" th:src = "@{/static/img/yyd.png}" alt="一亿度" width="280px" class="pic" /> </div> <div align="center"> <!--action实现跳转--> <form action="/home/query"> <input type="text" class="input" name="keyword" /> <input type="submit" class="button" value="一亿度下" /> </form> </div> </div> </body> </html>
搜索结果页面
<!DOCTYPE html> <html lang="en" xmlns:th="http://www.thymeleaf.org"> <head> <link rel="stylesheet" href="https://cdn.staticfile.org/twitter-bootstrap/4.3.1/css/bootstrap.min.css"> <meta charset="UTF-8"> <title>xx-manager</title> </head> <body> <header th:replace="search.html"></header> <div class="container my-2"> <ul th:each="article : ${articles}"> <a th:href="${article.url}"><li th:text="${article.author}+${article.content}"></li></a> </ul> </div> <footer th:replace="footer.html"></footer> </body> </html>
整体思路解析:
当用户输入关键词搜索时,首先使用ik分词器分词把数据存储在es客户端中,通过业务的代码来对数据进行检索,检索到了数据,通过对外接口返回给界面,向用户展示搜索的结果,就是按图开发代码。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。