当前位置:   article > 正文

SpringBoot+ElasticSearch实现文档内容抽取、高亮分词、全文检索_es文章全文检索和高亮

es文章全文检索和高亮

需求

产品希望我们这边能够实现用户上传PDF、WORD、TXT之内得文本内容,然后用户可以根据附件名称或文件内容模糊查询文件信息,并可以在线查看文件内容。

一、环境

项目开发环境:

  • 后台管理系统springboot+mybatis_plus+mysql+es

  • 搜索引擎:elasticsearch7.9.3 +kibana图形化界面

二、功能实现

1.搭建环境

es+kibana的搭建这里就不介绍了,网上多的是

后台程序搭建也不介绍,这里有一点很重要,Java使用的连接es的包的版本一定要和es的版本对应上,不然你会有各种问题

2.文件内容识别

第一步: 要用es实现文本附件内容的识别,需要先给es安装一个插件:Ingest Attachment Processor Plugin

这知识一个内容识别的插件,还有其它的例如OCR之类的其它插件,有兴趣的可以去搜一下了解一下

Ingest Attachment Processor Plugin是一个文本抽取插件,本质上是利用了Elasticsearch的ingest node功能,提供了关键的预处理器attachment。在安装目录下运行以下命令即可安装。

到es的安装文件bin目录下执行

elasticsearch-plugin install ingest-attachment

因为我们这里es是使用docker安装的,所以需要进入到es的docker镜像里面的bin目录下安装插件

  1. [root@iZuf63d0pqnjrga4pi18udZ plugins]# docker exec -it es bash
  2. [root@elasticsearch elasticsearch]# ls
  3. LICENSE.txt  NOTICE.txt  README.asciidoc  bin  config  data  jdk  lib  logs  modules  plugins
  4. [root@elasticsearch elasticsearch]# cd bin/
  5. [root@elasticsearch bin]# ls
  6. elasticsearch          elasticsearch-certutil  elasticsearch-croneval  elasticsearch-env-from-file  elasticsearch-migrate  elasticsearch-plugin         elasticsearch-setup-passwords  elasticsearch-sql-cli            elasticsearch-syskeygen  x-pack-env           x-pack-watcher-env
  7. elasticsearch-certgen  elasticsearch-cli       elasticsearch-env       elasticsearch-keystore       elasticsearch-node     elasticsearch-saml-metadata  elasticsearch-shard            elasticsearch-sql-cli-7.9.3.jar  elasticsearch-users      x-pack-security-env
  8. [root@elasticsearch bin]# elasticsearch-plugin install ingest-attachment
  9. -> Installing ingest-attachment
  10. -> Downloading ingest-attachment from elastic
  11. [=================================================100%?? 
  12. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  13. @     WARNING: plugin requires additional permissions     @
  14. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  15. * java.lang.RuntimePermission accessClassInPackage.sun.java2d.cmm.kcms
  16. * java.lang.RuntimePermission accessDeclaredMembers
  17. * java.lang.RuntimePermission getClassLoader
  18. * java.lang.reflect.ReflectPermission suppressAccessChecks
  19. * java.security.SecurityPermission createAccessControlContext
  20. * java.security.SecurityPermission insertProvider
  21. * java.security.SecurityPermission putProviderProperty.BC
  22. See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
  23. for descriptions of what these permissions allow and the associated risks.
  24.  
  25. Continue with installation? [y/N]y
  26. -> Installed ingest-attachment

显示installed 就表示安装完成了,然后重启es,不然第二步要报错

第二步:创建一个文本抽取的管道

主要是用于将上传的附件转换成文本内容,支持(word,PDF,txt,excel没试,应该也支持)

 

  1. {
  2.     "description""Extract attachment information",
  3.     "processors": [
  4.         {
  5.             "attachment": {
  6.                 "field""content",
  7.                 "ignore_missing"true
  8.             }
  9.         },
  10.         {
  11.             "remove": {
  12.                 "field""content"
  13.             }
  14.         }
  15.     ]
  16. }

第三步:定义我们内容存储的索引

图片

  1. {
  2.   "mappings": {
  3.     "properties": {
  4.       "id":{
  5.         "type": "keyword"
  6.       },
  7.       "fileName":{
  8.         "type": "text",
  9.         "analyzer""my_ana"
  10.       },
  11.       "contentType":{
  12.         "type": "text",
  13.          "analyzer""my_ana"
  14.       },
  15.        "fileUrl":{
  16.         "type": "text"
  17.       },
  18.       "attachment": {
  19.         "properties": {
  20.           "content":{
  21.             "type": "text",
  22.             "analyzer""my_ana"
  23.           }
  24.         }
  25.       }
  26.     }
  27.   },
  28.   "settings": {
  29.     "analysis": {
  30.       "filter": {
  31.         "jieba_stop": {
  32.           "type":        "stop",
  33.           "stopwords_path""stopword/stopwords.txt"
  34.         },
  35.         "jieba_synonym": {
  36.           "type":        "synonym",
  37.           "synonyms_path""synonym/synonyms.txt"
  38.         }
  39.       },
  40.       "analyzer": {
  41.         "my_ana": {
  42.           "tokenizer": "jieba_index",
  43.           "filter": [
  44.             "lowercase",
  45.             "jieba_stop",
  46.             "jieba_synonym"
  47.           ]
  48.         }
  49.       }
  50.     }
  51.   }
  52. }
  • mapping:定义的是存储的字段格式

  • setting:索引的配置信息,这边定义了一个分词(使用的是jieba的分词)

注意:内容检索的是attachment.content字段,一定要使用分词,不使用分词的话,检索会检索不出来内容

第四步:测试

 

  1. {
  2.     "id":"1",
  3.  "name":"进口红酒",
  4.  "filetype":"pdf",
  5.     "contenttype":"文章",
  6.  "content":"文章内容"
  7. }

测试内容需要将附件转换成base64格式

在线转换文件的地址:

https://www.zhangxinxu.com/sp/base64.html

查询刚刚上传的文件:

图片

  1. {
  2.     "took"861,
  3.     "timed_out"false,
  4.     "_shards": {
  5.         "total"1,
  6.         "successful"1,
  7.         "skipped"0,
  8.         "failed"0
  9.     },
  10.     "hits": {
  11.         "total": {
  12.             "value"5,
  13.             "relation""eq"
  14.         },
  15.         "max_score"1.0,
  16.         "hits": [
  17.             {
  18.                 "_index""fileinfo",
  19.                 "_type""_doc",
  20.                 "_id""lkPEgYIBz3NlBKQzXYX9",
  21.                 "_score"1.0,
  22.                 "_source": {
  23.                     "fileName""测试_20220809164145A002.docx",
  24.                     "updateTime"1660034506000,
  25.                     "attachment": {
  26.                         "date""2022-08-09T01:38:00Z",
  27.                         "content_type""application/vnd.openxmlformats-officedocument.wordprocessingml.document",
  28.                         "author""DELL",
  29.                         "language""lt",
  30.                         "content""内容",
  31.                         "content_length"2572
  32.                     },
  33.                     "createTime"1660034506000,
  34.                     "fileUrl""http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/09/测试_20220809164145A002.docx",
  35.                     "id"1306333192,
  36.                     "contentType""文章",
  37.                     "fileType""docx"
  38.                 }
  39.             },
  40.             {
  41.                 "_index""fileinfo",
  42.                 "_type""_doc",
  43.                 "_id""mUPHgYIBz3NlBKQzwIVW",
  44.                 "_score"1.0,
  45.                 "_source": {
  46.                     "fileName""测试_20220809164527A001.docx",
  47.                     "updateTime"1660034728000,
  48.                     "attachment": {
  49.                         "date""2022-08-09T01:38:00Z",
  50.                         "content_type""application/vnd.openxmlformats-officedocument.wordprocessingml.document",
  51.                         "author""DELL",
  52.                         "language""lt",
  53.                         "content""内容",
  54.                         "content_length"2572
  55.                     },
  56.                     "createTime"1660034728000,
  57.                     "fileUrl""http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/09/测试_20220809164527A001.docx",
  58.                     "id"1306333193,
  59.                     "contentType""文章",
  60.                     "fileType""docx"
  61.                 }
  62.             },
  63.             {
  64.                 "_index""fileinfo",
  65.                 "_type""_doc",
  66.                 "_id""JDqshoIBbkTNu1UgkzFK",
  67.                 "_score"1.0,
  68.                 "_source": {
  69.                     "fileName""txt测试_20220810153351A001.txt",
  70.                     "updateTime"1660116831000,
  71.                     "attachment": {
  72.                         "content_type""text/plain; charset=UTF-8",
  73.                         "language""lt",
  74.                         "content""内容",
  75.                         "content_length"804
  76.                     },
  77.                     "createTime"1660116831000,
  78.                     "fileUrl""http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/10/txt测试_20220810153351A001.txt",
  79.                     "id"1306333194,
  80.                     "contentType""告示",
  81.                     "fileType""txt"
  82.                 }
  83.             }
  84.         ]
  85.     }
  86. }

我们调用上传的接口,可以看到文本内容已经抽取到es里面了,后面就可以直接分词检索内容,高亮显示了

三.代码

介绍下代码实现逻辑:文件上传,数据库存储附件信息和附件上传地址;调用es实现文本内容抽取,将抽取的内容放到对应索引下;提供小程序全文检索的api实现根据文件名称关键词联想,文件名称内容全文检索模糊匹配,并高亮显示分词匹配字段;直接贴代码

yml配置文件:

  1. # 数据源配置
  2. spring:
  3.     # 服务模块
  4.     devtools:
  5.         restart:
  6.             # 热部署开关
  7.             enabled: true
  8.     # 搜索引擎
  9.     elasticsearch:
  10.         rest:
  11.             url: 127.0.0.1
  12.             uris: 127.0.0.1:9200
  13.             connection-timeout: 1000
  14.             read-timeout: 3000
  15.             username: elastic
  16.             password: 123456

elsticsearchConfig(连接配置)

  1. package com.yj.rselasticsearch.domain.config;
  2.  
  3. import org.apache.http.HttpHost;
  4. import org.apache.http.auth.AuthScope;
  5. import org.apache.http.auth.UsernamePasswordCredentials;
  6. import org.apache.http.impl.client.BasicCredentialsProvider;
  7. import org.elasticsearch.client.RestClient;
  8. import org.elasticsearch.client.RestHighLevelClient;
  9. import org.springframework.beans.factory.annotation.Value;
  10. import org.springframework.context.annotation.Bean;
  11. import org.springframework.context.annotation.Configuration;
  12.  
  13. import java.time.Duration;
  14.  
  15. @Configuration
  16. public class ElasticsearchConfig {
  17.     @Value("${spring.elasticsearch.rest.url}")
  18.     private String edUrl;
  19.     @Value("${spring.elasticsearch.rest.username}")
  20.     private String userName;
  21.     @Value("${spring.elasticsearch.rest.password}")
  22.     private String password;
  23.  
  24.     @Bean
  25.     public RestHighLevelClient restHighLevelClient() {
  26.         //设置连接的用户名密码
  27.         final BasicCredentialsProvider credentialsProvider = new BasicCredentialsProvider();
  28.         credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, password));
  29.         RestHighLevelClient client =  new RestHighLevelClient(RestClient.builder(
  30.                         new HttpHost(edUrl, 9200,"http"))
  31.                 .setHttpClientConfigCallback(httpClientBuilder -> {
  32.                     httpClientBuilder.disableAuthCaching();
  33.                     //保持连接池处于链接状态,该bug曾导致es一段时间没使用,第一次连接访问超时
  34.                     httpClientBuilder.setKeepAliveStrategy(((response, context) -> Duration.ofMinutes(5).toMillis()));
  35.                     return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
  36.                 })
  37.         );
  38.         return client;
  39.     }
  40. }

文件上传保存文件信息并抽取内容到es

实体对象FileInfo

  1. package com.yj.common.core.domain.entity;
  2.  
  3. import com.baomidou.mybatisplus.annotation.TableField;
  4. import com.yj.common.core.domain.BaseEntity;
  5. import lombok.Data;
  6. import lombok.EqualsAndHashCode;
  7. import lombok.Getter;
  8. import lombok.Setter;
  9. import org.springframework.data.elasticsearch.annotations.Document;
  10. import org.springframework.data.elasticsearch.annotations.Field;
  11. import org.springframework.data.elasticsearch.annotations.FieldType;
  12.  
  13. import java.util.Date;
  14.  
  15. @Setter
  16. @Getter
  17. @Document(indexName = "fileinfo",createIndex = false)
  18. public class FileInfo {
  19.     /**
  20.     * 主键
  21.     */
  22.     @Field(name = "id"type = FieldType.Integer)
  23.     private Integer id;
  24.  
  25.     /**
  26.     * 文件名称
  27.     */
  28.     @Field(name = "fileName"type = FieldType.Text,analyzer = "jieba_index",searchAnalyzer = "jieba_index")
  29.     private String fileName;
  30.  
  31.     /**
  32.     * 文件类型
  33.     */
  34.     @Field(name = "fileType",  type = FieldType.Keyword)
  35.     private String fileType;
  36.  
  37.     /**
  38.     * 内容类型
  39.     */
  40.     @Field(name = "contentType"type = FieldType.Text)
  41.     private String contentType;
  42.  
  43.     /**
  44.      * 附件内容
  45.      */
  46.     @Field(name = "attachment.content"type = FieldType.Text,analyzer = "jieba_index",searchAnalyzer = "jieba_index")
  47.     @TableField(exist = false)
  48.     private String content;
  49.  
  50.     /**
  51.     * 文件地址
  52.     */
  53.     @Field(name = "fileUrl"type = FieldType.Text)
  54.     private String fileUrl;
  55.  
  56.     /**
  57.      * 创建时间
  58.      */
  59.     private Date createTime;
  60.  
  61.     /**
  62.      * 更新时间
  63.      */
  64.     private Date updateTime;
  65. }

controller类

  1. package com.yj.rselasticsearch.controller;
  2.  
  3. import com.yj.common.core.controller.BaseController;
  4. import com.yj.common.core.domain.AjaxResult;
  5. import com.yj.common.core.domain.entity.FileInfo;
  6. import com.yj.rselasticsearch.service.FileInfoService;
  7. import org.springframework.web.bind.annotation.*;
  8. import org.springframework.web.multipart.MultipartFile;
  9.  
  10. import javax.annotation.Resource;
  11.  
  12. /**
  13.  * (file_info)表控制层
  14.  *
  15.  * @author xxxxx
  16.  */
  17. @RestController
  18. @RequestMapping("/fileInfo")
  19. public class FileInfoController extends BaseController {
  20.     /**
  21.      * 服务对象
  22.      */
  23.     @Resource
  24.     private FileInfoService fileInfoService;
  25.  
  26.  
  27.     @PutMapping("uploadFile")
  28.     public AjaxResult uploadFile(String contentType, MultipartFile file) {
  29.         return fileInfoService.uploadFileInfo(contentType,file);
  30.     }
  31. }

serviceImpl实现类

  1. package com.yj.rselasticsearch.service.impl;
  2.  
  3. import com.alibaba.fastjson.JSON;
  4. import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
  5. import com.yj.common.config.RuoYiConfig;
  6. import com.yj.common.core.domain.AjaxResult;
  7. import com.yj.common.utils.FastUtils;
  8. import com.yj.common.utils.StringUtils;
  9. import com.yj.common.utils.file.FileUploadUtils;
  10. import com.yj.common.utils.file.FileUtils;
  11. import com.yj.framework.config.ServerConfig;
  12. import lombok.extern.slf4j.Slf4j;
  13. import org.elasticsearch.action.index.IndexRequest;
  14. import org.elasticsearch.action.index.IndexResponse;
  15. import org.elasticsearch.client.RequestOptions;
  16. import org.elasticsearch.client.RestHighLevelClient;
  17. import org.elasticsearch.common.xcontent.XContentType;
  18. import org.springframework.beans.factory.annotation.Autowired;
  19. import org.springframework.beans.factory.annotation.Qualifier;
  20. import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
  21. import org.springframework.stereotype.Service;
  22. import javax.annotation.Resource;
  23. import com.yj.common.core.domain.entity.FileInfo;
  24. import com.yj.rselasticsearch.mapper.FileInfoMapper;
  25. import com.yj.rselasticsearch.service.FileInfoService;
  26. import org.springframework.web.multipart.MultipartFile;
  27.  
  28. import java.io.File;
  29. import java.io.FileInputStream;
  30. import java.io.IOException;
  31. import java.util.Base64;
  32.  
  33. @Service
  34. @Slf4j
  35. public class FileInfoServiceImpl implements FileInfoService{
  36.     @Resource
  37.     private ServerConfig serverConfig;
  38.  
  39.     @Autowired
  40.     @Qualifier("restHighLevelClient")
  41.     private RestHighLevelClient client;
  42.  
  43.     @Resource
  44.     private FileInfoMapper fileInfoMapper;
  45.  
  46.     /**
  47.      * 上传文件并进行文件内容识别上传到es
  48.      * @param contentType
  49.      * @param file
  50.      * @return
  51.      */
  52.     @Override
  53.     public AjaxResult uploadFileInfo(String contentType, MultipartFile file) {
  54.         if (FastUtils.checkNullOrEmpty(contentType,file)){
  55.             return AjaxResult.error("请求参数不能为空");
  56.         }
  57.         try {
  58.             // 上传文件路径
  59.             String filePath = RuoYiConfig.getUploadPath() + "/fileInfo";
  60.             FileInfo fileInfo = new FileInfo();
  61.             // 上传并返回新文件名称
  62.             String fileName = FileUploadUtils.upload(filePath, file);
  63.             String prefix = fileName.substring(fileName.lastIndexOf(".")+1);
  64.             File files = File.createTempFile(fileName, prefix);
  65.             file.transferTo(files);
  66.             String url = serverConfig.getUrl() + "/fileInfo" + fileName;
  67.             fileInfo.setFileName(FileUtils.getName(fileName));
  68.             fileInfo.setFileType(prefix);
  69.             fileInfo.setFileUrl(url);
  70.             fileInfo.setContentType(contentType);
  71.             int result = fileInfoMapper.insertSelective(fileInfo);
  72.             if (result > 0) {
  73.                 fileInfo = fileInfoMapper.selectOne(new LambdaQueryWrapper<FileInfo>().eq(FileInfo::getFileUrl,fileInfo.getFileUrl()));
  74.                 byte[] bytes = getContent(files);
  75.                 String base64 = Base64.getEncoder().encodeToString(bytes);
  76.                 fileInfo.setContent(base64);
  77.                 IndexRequest indexRequest = new IndexRequest("fileinfo");
  78.                 //上传同时,使用attachment pipline进行提取文件
  79.                 indexRequest.source(JSON.toJSONString(fileInfo), XContentType.JSON);
  80.                 indexRequest.setPipeline("attachment");
  81.                 IndexResponse indexResponse = client.index(indexRequest, RequestOptions.DEFAULT);
  82.                 log.info("indexResponse:" + indexResponse);
  83.             }
  84.             AjaxResult ajax = AjaxResult.success(fileInfo);
  85.             return ajax;
  86.         } catch (Exception e) {
  87.             return AjaxResult.error(e.getMessage());
  88.         }
  89.     }
  90.  
  91.  
  92.      /**
  93.      * 文件转base64
  94.      *
  95.      * @param file
  96.      * @return
  97.      * @throws IOException
  98.      */
  99.     private byte[] getContent(File file) throws IOException {
  100.  
  101.         long fileSize = file.length();
  102.         if (fileSize > Integer.MAX_VALUE) {
  103.             log.info("file too big...");
  104.             return null;
  105.         }
  106.         FileInputStream fi = new FileInputStream(file);
  107.         byte[] buffer = new byte[(int) fileSize];
  108.         int offset = 0;
  109.         int numRead = 0;
  110.         while (offset < buffer.length
  111.                 && (numRead = fi.read(buffer, offset, buffer.length - offset)) >= 0) {
  112.             offset += numRead;
  113.         }
  114.         // 确保所有数据均被读取
  115.         if (offset != buffer.length) {
  116.             throw new ServiceException("Could not completely read file "
  117.                     + file.getName());
  118.         }
  119.         fi.close();
  120.         return buffer;
  121.     }
  122. }

高亮分词检索

参数请求WarningInfoDto

  1. package com.yj.rselasticsearch.domain.dto;
  2.  
  3. import com.yj.common.core.domain.entity.WarningInfo;
  4. import io.swagger.annotations.ApiModel;
  5. import io.swagger.annotations.ApiModelProperty;
  6. import lombok.Data;
  7.  
  8. import java.util.List;
  9.  
  10. /**
  11.  * 前端请求数据传输
  12.  * WarningInfo
  13.  * @author luoY
  14.  */
  15. @Data
  16. @ApiModel(value ="WarningInfoDto",description = "告警信息")
  17. public class WarningInfoDto{
  18.     /**
  19.      * 页数
  20.      */
  21.     @ApiModelProperty("页数")
  22.     private Integer pageIndex;
  23.  
  24.     /**
  25.      * 每页数量
  26.      */
  27.     @ApiModelProperty("每页数量")
  28.     private Integer pageSize;
  29.  
  30.     /**
  31.      * 查询关键词
  32.      */
  33.     @ApiModelProperty("查询关键词")
  34.     private String keyword;
  35.  
  36.     /**
  37.      * 内容类型
  38.      */
  39.     private List<String> contentType;
  40.  
  41.     /**
  42.      * 用户手机号
  43.      */
  44.     private String phone;
  45. }

controller类

  1. package com.yj.rselasticsearch.controller;
  2.  
  3. import com.baomidou.mybatisplus.core.metadata.IPage;
  4. import com.yj.common.core.controller.BaseController;
  5. import com.yj.common.core.domain.AjaxResult;
  6. import com.yj.common.core.domain.entity.FileInfo;
  7. import com.yj.common.core.domain.entity.WarningInfo;
  8. import com.yj.rselasticsearch.service.ElasticsearchService;
  9. import com.yj.rselasticsearch.service.WarningInfoService;
  10. import io.swagger.annotations.Api;
  11. import io.swagger.annotations.ApiImplicitParam;
  12. import io.swagger.annotations.ApiImplicitParams;
  13. import io.swagger.annotations.ApiOperation;
  14. import org.springframework.web.bind.annotation.*;
  15. import com.yj.rselasticsearch.domain.dto.WarningInfoDto;
  16.  
  17. import javax.annotation.Resource;
  18. import javax.servlet.http.HttpServletRequest;
  19. import java.util.List;
  20.  
  21. /**
  22.  * es搜索引擎
  23.  *
  24.  * @author luoy
  25.  */
  26. @Api("搜索引擎")
  27. @RestController
  28. @RequestMapping("es")
  29. public class ElasticsearchController extends BaseController {
  30.     @Resource
  31.     private ElasticsearchService elasticsearchService;
  32.  
  33.     /**
  34.      * 告警信息关键词联想
  35.      *
  36.      * @param warningInfoDto
  37.      * @return
  38.      */
  39.     @ApiOperation("关键词联想")
  40.     @ApiImplicitParams({
  41.             @ApiImplicitParam(name = "contenttype"value = "文档类型", required = true, dataType = "String", dataTypeClass = String.class),
  42.             @ApiImplicitParam(name = "keyword"value = "关键词", required = true, dataType = "String", dataTypeClass = String.class)
  43.     })
  44.     @PostMapping("getAssociationalWordDoc")
  45.     public AjaxResult getAssociationalWordDoc(@RequestBody WarningInfoDto warningInfoDto, HttpServletRequest request) {
  46.         List<String> words = elasticsearchService.getAssociationalWordOther(warningInfoDto,request);
  47.         return AjaxResult.success(words);
  48.     }
  49.  
  50.  
  51.     /**
  52.      * 告警信息高亮分词分页查询
  53.      *
  54.      * @param warningInfoDto
  55.      * @return
  56.      */
  57.     @ApiOperation("高亮分词分页查询")
  58.     @ApiImplicitParams({
  59.             @ApiImplicitParam(name = "keyword"value = "关键词", required = true, dataType = "String", dataTypeClass = String.class),
  60.             @ApiImplicitParam(name = "pageIndex"value = "页码", required = true, dataType = "Integer", dataTypeClass = Integer.class),
  61.             @ApiImplicitParam(name = "pageSize"value = "页数", required = true, dataType = "Integer", dataTypeClass = Integer.class),
  62.             @ApiImplicitParam(name = "contenttype"value = "文档类型", required = true, dataType = "String", dataTypeClass = String.class)
  63.     })
  64.     @PostMapping("queryHighLightWordDoc")
  65.     public AjaxResult queryHighLightWordDoc(@RequestBody WarningInfoDto warningInfoDto,HttpServletRequest request) {
  66.         IPage<FileInfo> warningInfoListPage = elasticsearchService.queryHighLightWordOther(warningInfoDto,request);
  67.         return AjaxResult.success(warningInfoListPage);
  68.     }
  69. }

serviceImpl实现类

  1. package com.yj.rselasticsearch.service.impl;
  2.  
  3. import com.alibaba.fastjson.JSON;
  4. import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
  5. import com.baomidou.mybatisplus.core.metadata.IPage;
  6. import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
  7. import com.yj.common.constant.DataConstants;
  8. import com.yj.common.constant.HttpStatus;
  9. import com.yj.common.core.domain.entity.FileInfo;
  10. import com.yj.common.core.domain.entity.WarningInfo;
  11. import com.yj.common.core.domain.entity.WhiteList;
  12. import com.yj.common.core.redis.RedisCache;
  13. import com.yj.common.exception.ServiceException;
  14. import com.yj.common.utils.FastUtils;
  15. import com.yj.rselasticsearch.domain.dto.RetrievalRecordDto;
  16. import com.yj.rselasticsearch.domain.dto.WarningInfoDto;
  17. import com.yj.rselasticsearch.domain.vo.MemberVo;
  18. import com.yj.rselasticsearch.service.*;
  19. import lombok.extern.slf4j.Slf4j;
  20. import org.elasticsearch.action.bulk.BulkRequest;
  21. import org.elasticsearch.action.bulk.BulkResponse;
  22. import org.elasticsearch.action.index.IndexRequest;
  23. import org.elasticsearch.client.RequestOptions;
  24. import org.elasticsearch.client.RestHighLevelClient;
  25. import org.elasticsearch.common.xcontent.XContentType;
  26. import org.elasticsearch.index.query.BoolQueryBuilder;
  27. import org.elasticsearch.index.query.Operator;
  28. import org.elasticsearch.index.query.QueryBuilders;
  29. import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
  30. import org.springframework.beans.factory.annotation.Autowired;
  31. import org.springframework.beans.factory.annotation.Qualifier;
  32. import org.springframework.data.domain.PageRequest;
  33. import org.springframework.data.domain.Pageable;
  34. import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
  35. import org.springframework.data.elasticsearch.core.SearchHits;
  36. import org.springframework.data.elasticsearch.core.query.*;
  37. import org.springframework.stereotype.Service;
  38.  
  39. import javax.annotation.Resource;
  40. import javax.servlet.http.HttpServletRequest;
  41. import java.util.*;
  42. import java.util.stream.Collectors;
  43.  
  44. @Service
  45. @Slf4j
  46. public class ElasticsearchServiceImpl implements ElasticsearchService {
  47.  
  48.     @Resource
  49.     private WhiteListService whiteListService;
  50.  
  51.     @Autowired
  52.     @Qualifier("restHighLevelClient")
  53.     private RestHighLevelClient client;
  54.  
  55.     @Autowired
  56.     private RedisCache redisCache;
  57.  
  58.     @Resource
  59.     private TokenService tokenService;
  60.  
  61.  
  62.     /**
  63.      * 文档信息关键词联想(根据输入框的词语联想文件名称)
  64.      *
  65.      * @param warningInfoDto
  66.      * @return
  67.      */
  68.     @Override
  69.     public List<String> getAssociationalWordOther(WarningInfoDto warningInfoDto, HttpServletRequest request) {
  70.         //需要查询的字段
  71.         BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
  72.                 .should(QueryBuilders.matchBoolPrefixQuery("fileName", warningInfoDto.getKeyword()));
  73.         //contentType标签内容过滤
  74.         boolQueryBuilder.must(QueryBuilders.termsQuery("contentType", warningInfoDto.getContentType()));
  75.         //构建高亮查询
  76.         NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  77.                 .withQuery(boolQueryBuilder)
  78.                 .withHighlightFields(
  79.                         new HighlightBuilder.Field("fileName")
  80.                 )
  81.                 .withHighlightBuilder(new HighlightBuilder().preTags("<span style='color:red'>").postTags("</span>"))
  82.                 .build();
  83.         //查询
  84.         SearchHits<FileInfo> search = null;
  85.         try {
  86.             search = elasticsearchRestTemplate.search(searchQuery, FileInfo.class);
  87.         } catch (Exception ex) {
  88.             ex.printStackTrace();
  89.             throw new ServiceException(String.format("操作错误,请联系管理员!%s", ex.getMessage()));
  90.         }
  91.         //设置一个最后需要返回的实体类集合
  92.         List<String> resultList = new LinkedList<>();
  93.         //遍历返回的内容进行处理
  94.         for (org.springframework.data.elasticsearch.core.SearchHit<FileInfo> searchHit : search.getSearchHits()) {
  95.             //高亮的内容
  96.             Map<String, List<String>> highlightFields = searchHit.getHighlightFields();
  97.             //将高亮的内容填充到content
  98.             searchHit.getContent().setFileName(highlightFields.get("fileName"== null ? searchHit.getContent().getFileName() : highlightFields.get("fileName").get(0));
  99.             if (highlightFields.get("fileName") != null) {
  100.                 resultList.add(searchHit.getContent().getFileName());
  101.             }
  102.         }
  103.         //list去重
  104.         List<String> newResult = null;
  105.         if (!FastUtils.checkNullOrEmpty(resultList)) {
  106.             if (resultList.size() > 9) {
  107.                 newResult = resultList.stream().distinct().collect(Collectors.toList()).subList(09);
  108.             } else {
  109.                 newResult = resultList.stream().distinct().collect(Collectors.toList());
  110.             }
  111.         }
  112.         return newResult;
  113.     }
  114.  
  115.     /**
  116.      * 高亮分词搜索其它类型文档
  117.      *
  118.      * @param warningInfoDto
  119.      * @param request
  120.      * @return
  121.      */
  122.     @Override
  123.     public IPage<FileInfo> queryHighLightWordOther(WarningInfoDto warningInfoDto, HttpServletRequest request) {
  124.         //分页
  125.         Pageable pageable = PageRequest.of(warningInfoDto.getPageIndex() - 1, warningInfoDto.getPageSize());
  126.          //需要查询的字段,根据输入的内容分词全文检索fileName和content字段
  127.         BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
  128.                 .should(QueryBuilders.matchBoolPrefixQuery("fileName", warningInfoDto.getKeyword()))
  129.                 .should(QueryBuilders.matchBoolPrefixQuery("attachment.content", warningInfoDto.getKeyword()));
  130.         //contentType标签内容过滤
  131.         boolQueryBuilder.must(QueryBuilders.termsQuery("contentType", warningInfoDto.getContentType()));
  132.         //构建高亮查询
  133.         NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  134.                 .withQuery(boolQueryBuilder)
  135.                 .withHighlightFields(
  136.                         new HighlightBuilder.Field("fileName"), new HighlightBuilder.Field("attachment.content")
  137.                 )
  138.                 .withHighlightBuilder(new HighlightBuilder().preTags("<span style='color:red'>").postTags("</span>"))
  139.                 .build();
  140.         //查询
  141.         SearchHits<FileInfo> search = null;
  142.         try {
  143.             search = elasticsearchRestTemplate.search(searchQuery, FileInfo.class);
  144.         } catch (Exception ex) {
  145.             ex.printStackTrace();
  146.             throw new ServiceException(String.format("操作错误,请联系管理员!%s", ex.getMessage()));
  147.         }
  148.         //设置一个最后需要返回的实体类集合
  149.         List<FileInfo> resultList = new LinkedList<>();
  150.         //遍历返回的内容进行处理
  151.         for (org.springframework.data.elasticsearch.core.SearchHit<FileInfo> searchHit : search.getSearchHits()) {
  152.             //高亮的内容
  153.             Map<String, List<String>> highlightFields = searchHit.getHighlightFields();
  154.             //将高亮的内容填充到content
  155.             searchHit.getContent().setFileName(highlightFields.get("fileName"== null ? searchHit.getContent().getFileName() : highlightFields.get("fileName").get(0));
  156.             searchHit.getContent().setContent(highlightFields.get("content"== null ? searchHit.getContent().getContent() : highlightFields.get("content").get(0));
  157.             resultList.add(searchHit.getContent());
  158.         }
  159.         //手动分页返回信息
  160.         IPage<FileInfo> warningInfoIPage = new Page<>();
  161.         warningInfoIPage.setTotal(search.getTotalHits());
  162.         warningInfoIPage.setRecords(resultList);
  163.         warningInfoIPage.setCurrent(warningInfoDto.getPageIndex());
  164.         warningInfoIPage.setSize(warningInfoDto.getPageSize());
  165.         warningInfoIPage.setPages(warningInfoIPage.getTotal() % warningInfoDto.getPageSize());
  166.         return warningInfoIPage;
  167.     }
  168. }

代码测试:

 

  1. --请求jason
  2. {
  3.     "keyword":"全库备份",
  4.     "contentType":["告示"],
  5.     "pageIndex":1,
  6.     "pageSize":10
  7. }
  8.  
  9.  
  10. --响应
  11. {
  12.     "msg""操作成功",
  13.     "code"200,
  14.     "data": {
  15.         "records": [
  16.             {
  17.                 "id"1306333194,
  18.                 "fileName""txt测试_20220810153351A001.txt",
  19.                 "fileType""txt",
  20.                 "contentType""告示",
  21.                 "content""•\t秒级快速<span style='color:red'>备份</span>\r\n不论多大的数据量,<span style='color:red'>全库</span><span style='color:red'>备份</span>只需30秒,而且<span style='color:red'>备份过程</span>不会对数据库加锁,对应用程序几乎无影响,全天24小时均可进行<span style='color:red'>备份</span>。",
  22.                 "fileUrl""http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/10/txt测试_20220810153351A001.txt",
  23.                 "createTime""2022-08-10T15:33:51.000+08:00",
  24.                 "updateTime""2022-08-10T15:33:51.000+08:00"
  25.             }
  26.         ],
  27.         "total"1,
  28.         "size"10,
  29.         "current"1,
  30.         "orders": [],
  31.         "optimizeCountSql"true,
  32.         "searchCount"true,
  33.         "countId"null,
  34.         "maxLimit"null,
  35.         "pages"1
  36.     }
  37. }

返回的内容将分词检索到匹配的内容,并将匹配的词高亮显示。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/860255
推荐阅读
相关标签
  

闽ICP备14008679号