赞
踩
目录
定义:
搜索引擎是指根据一定的策略、运用特定的计算机程序从互联网上采集信息,在对信息进行组织和处理后,为用户提供检索服务,将检索的相关信息展示给用户的系统。
如图,在百度搜索中输入“康熙”,就会显示以下页面:
以用户的角度来看:用户输入搜索词(一个词或者多个词),在已有的文档中,找到文档包含这些词的所有文档信息,给出相应的列表。假设文档个数为m,文档平均长度(标题+内容)为n,则复杂度为O(m*n);现实中,m非常的大,所以从性能上将这种方式否决。
标准的解法就是使用倒排索引(Inverted index)。
文档:被检索的 html 页面,pdf,图片,视频等等。
倒排索引:一个词被那些文档所引用,描述一个词的基本信息,存储了这个词都存在在那些文档中,这个词在文档中的重要程度。
正排索引:一个文档中包含了那些词,描述一个文档的基本信息,将文档中的词进行分词处理并存储。
该项目只针对 JDK API 文档库中的 html 做搜索。下载地址:文档下载
主要实现以下两个模块:
forward.json:存储正排索引,没有考虑性能,使用了方便理解的 JSON 格式;
inverted.json:存储倒排索引 ;
构建索引的步骤:
1. 扫描文档目录下的所有文档:目录遍历的过程 FileScanner;
2. 针对每一篇文档进行分析,处理,得到文档的标题,最终访问的URL,文档下的内容;
3. 每一篇文档:标题,url,内容,标题和内容的每个词;利用上述信息就可以构建索引;
4. 保存索引信息(可以保存成文件系统的一个文件,或者表中的记录)
创建Spring Boot项目
构建搜索引擎,只需要Lombok即可;
点击 Finish 就完成 Spring Boot 的项目创建了;
在indexer下创建command.Indexer,构建索引的模块,是整个程序的逻辑入口;
- /**
- * 构建索引的模块,是整个程序的逻辑入口
- */
- @Slf4j // 添加 Spring 日志的使用
- @Component // 注册成 Spring 的 bean
- //@Profile("run") // 让跑测试的时候不加载这个 bean(run != test)
- public class Indexer implements CommandLineRunner {
- // 需要依赖 FileScanner 对象
- private final FileScanner fileScanner;
- private final IndexerProperties properties;
- private final IndexManager indexManager;
- private final ExecutorService executorService;
-
- @Autowired // 构造方法注入的方式,让 Spring 容器,注入 FileScanner 对象进来 —— DI
- public Indexer(FileScanner fileScanner, IndexerProperties properties, IndexManager indexManager, ExecutorService executorService) {
- this.fileScanner = fileScanner;
- this.properties = properties;
- this.indexManager = indexManager;
- this.executorService = executorService;
- }
-
- @Override
- public void run(String... args) throws Exception {
- ToAnalysis.parse("随便分个什么,进行预热,避免优化的时候计算第一次特别慢的时间");
-
- log.info("这里的整个程序的逻辑入口");
-
- // 1. 扫描出来所有的 html 文件
- log.debug("开始扫描目录,找出所有的 html 文件。{}", properties.getDocRootPath());
- List<File> htmlFileList = fileScanner.scanFile(properties.getDocRootPath(), file -> {
- return file.isFile() && file.getName().endsWith(".html");
- });
- log.debug("扫描目录结束,一共得到 {} 个文件。", htmlFileList.size());
-
- // 2. 针对每个 html 文件,得到其 标题、URL、正文信息,把这些信息封装成一个对象(文档 Document)
- File rootFile = new File(properties.getDocRootPath());
- List<Document> documentList = htmlFileList.stream()
- .parallel() // 【注意】由于我们使用了 Stream 用法,所以,可以通过添加 .parallel(),使得整个操作变成并行,利用多核增加运行速度
- .map(file -> new Document(file, properties.getUrlPrefix(), rootFile))
- .collect(Collectors.toList());
- log.debug("构建文档完毕,一共 {} 篇文档", documentList.size());
-
- // 3. 进行正排索引的保存
- indexManager.saveForwardIndexesConcurrent(documentList);
- log.debug("正排索引保存成功。");
-
- // 4. 进行倒排索引的生成核保存
- indexManager.saveInvertedIndexesConcurrent(documentList);
- log.debug("倒排索引保存成功。");
-
- // 5. 关闭线程池
- executorService.shutdown();
- }
- }
扫描文件,找到符合条件的文件;
- @Slf4j // 添加日志
- @Service // 注册成 Spring bean
- public class FileScanner {
- /**
- * 以 rootPath 作为根目录,开始进行文件的扫描,把所有符合条件的 File 对象,作为结果,以 List 形式返回
- * @param rootPath 根目录的路径,调用者需要确保这个目录存在 && 一定是一个目录
- * @param filter 通过针对每个文件调用 filter.accept(file) 就知道,文件是否满足条件
- * @return 满足条件的所有文件
- */
- public List<File> scanFile(String rootPath, FileFilter filter) {
- List<File> resultList = new ArrayList<>();
- File rootFile = new File(rootPath);
-
- // 针对目录树进行遍历,深度优先 or 广度优先即可,确保每个文件都没遍历到即可
- // 我们这里采用深度优先遍历,使用递归完成
- traversal(rootFile, filter, resultList);
-
- return resultList;
- }
-
- private void traversal(File directoryFile, FileFilter filter, List<File> resultList) {
- // 1. 先通过目录,得到该目录下的孩子文件有哪些
- File[] files = directoryFile.listFiles();
- if (files == null) {
- // 说明有问题,我们不管(一般是权限等的问题),通常咱们遇不到这个错误
- return;
- }
-
- // 2. 遍历每个文件,检查是否符合条件
- for (File file : files) {
- // 通过 filter.accept(file) 的返回值,判断是否符合条件
- if (filter.accept(file)) {
- // 说明符合条件,需要把该文件加入到结果 List 中
- resultList.add(file);
- }
- }
-
- // 3. 遍历每个文件,针对是目录的情况,继续深度优先遍历(递归)
- for (File file : files) {
- if (file.isDirectory()) {
- traversal(file, filter, resultList);
- }
- }
- }
- }
管理索引
- @Slf4j
- @Component
- public class IndexManager {
- private final IndexDatabaseMapper mapper;
- private final ExecutorService executorService;
-
- @Autowired
- public IndexManager(IndexDatabaseMapper mapper, ExecutorService executorService) {
- this.mapper = mapper;
- this.executorService = executorService;
- }
-
- // 先批量生成、保存正排索引(单线程版本)
- public void saveForwardIndexes(List<Document> documentList) {
- // 1. 批量插入时,每次插入多少条记录(由于每条记录比较大,所以这里使用 10 条就够了)
- int batchSize = 10;
- // 2. 一共需要执行多少次 SQL? 向上取整(documentList.size() / batchSize)
- int listSize = documentList.size();
- int times = (int) Math.ceil(1.0 * listSize / batchSize); // ceil(天花板): 向上取整
- log.debug("一共需要 {} 批任务。", times);
-
- // 3. 开始分批次插入
- for (int i = 0; i < listSize; i += batchSize) {
- // 从 documentList 中截取这批要插入的 文档列表(使用 List.subList(int from, int to)
- int from = i;
- int to = Integer.min(from + batchSize, listSize);
-
- List<Document> subList = documentList.subList(from, to);
-
- // 针对这个 subList 做批量插入
- mapper.batchInsertForwardIndexes(subList);
- }
- }
-
- @Timing("构建 + 保存正排索引 —— 多线程版本")
- @SneakyThrows
- public void saveForwardIndexesConcurrent(List<Document> documentList) {
- // 1. 批量插入时,每次插入多少条记录(由于每条记录比较大,所以这里使用 10 条就够了)
- int batchSize = 10;
- // 2. 一共需要执行多少次 SQL? 向上取整(documentList.size() / batchSize)
- int listSize = documentList.size();
- int times = (int) Math.ceil(1.0 * listSize / batchSize); // ceil(天花板): 向上取整
- log.debug("一共需要 {} 批任务。", times);
-
- CountDownLatch latch = new CountDownLatch(times); // 统计每个线程的完全情况,初始值是 times(一共多少批)
-
- // 3. 开始分批次插入
- for (int i = 0; i < listSize; i += batchSize) {
- // 从 documentList 中截取这批要插入的 文档列表(使用 List.subList(int from, int to)
- int from = i;
- int to = Integer.min(from + batchSize, listSize);
-
- Runnable task = () -> { // 内部类 / lambda 表达式里如果用到了外部变量,外部变量必须的 final(或者隐式 final 的变量)
- List<Document> subList = documentList.subList(from, to);
-
- // 针对这个 subList 做批量插入
- mapper.batchInsertForwardIndexes(subList);
-
- latch.countDown(); // 每次任务完成之后,countDown(),让 latch 的个数减一
- };
-
- executorService.submit(task); // 主线程只负责把一批批的任务提交到线程池,具体的插入工作,由线程池中的线程完成
- }
-
- // 4. 循环结束,只意味着主线程把任务提交完成了,但任务有没有做完是不知道的
- // 主线程等在 latch 上,只到 latch 的个数变成 0,也就是所有任务都已经执行完了
- latch.await();
- }
-
- @SneakyThrows
- public void saveInvertedIndexes(List<Document> documentList) {
- int batchSize = 10000; // 批量插入时,最多 10000 条
-
- List<InvertedRecord> recordList = new ArrayList<>(); // 放这批要插入的数据
-
- for (Document document : documentList) {
- Map<String, Integer> wordToWeight = document.segWordAndCalcWeight();
- for (Map.Entry<String, Integer> entry : wordToWeight.entrySet()) {
- String word = entry.getKey();
- int docId = document.getDocId();
- int weight = entry.getValue();
-
- InvertedRecord record = new InvertedRecord(word, docId, weight);
-
- recordList.add(record);
-
- // 如果 recordList.size() == batchSize,说明够一次插入了
- if (recordList.size() == batchSize) {
- mapper.batchInsertInvertedIndexes(recordList); // 批量插入
- recordList.clear(); // 清空 list,视为让 list.size() = 0
- }
- }
- }
-
- // recordList 还剩一些,之前放进来,但还不够 batchSize 个的,所以最后再批量插入一次
- mapper.batchInsertInvertedIndexes(recordList); // 批量插入
- recordList.clear();
- }
-
- static class InvertedInsertTask implements Runnable {
- private final CountDownLatch latch;
- private final int batchSize;
- private final List<Document> documentList;
- private final IndexDatabaseMapper mapper;
-
- InvertedInsertTask(CountDownLatch latch, int batchSize, List<Document> documentList, IndexDatabaseMapper mapper) {
- this.latch = latch;
- this.batchSize = batchSize;
- this.documentList = documentList;
- this.mapper = mapper;
- }
-
- @Override
- public void run() {
- List<InvertedRecord> recordList = new ArrayList<>(); // 放这批要插入的数据
-
- for (Document document : documentList) {
- Map<String, Integer> wordToWeight = document.segWordAndCalcWeight();
- for (Map.Entry<String, Integer> entry : wordToWeight.entrySet()) {
- String word = entry.getKey();
- int docId = document.getDocId();
- int weight = entry.getValue();
-
- InvertedRecord record = new InvertedRecord(word, docId, weight);
-
- recordList.add(record);
-
- // 如果 recordList.size() == batchSize,说明够一次插入了
- if (recordList.size() == batchSize) {
- mapper.batchInsertInvertedIndexes(recordList); // 批量插入
- recordList.clear(); // 清空 list,视为让 list.size() = 0
- }
- }
- }
-
- // recordList 还剩一些,之前放进来,但还不够 batchSize 个的,所以最后再批量插入一次
- mapper.batchInsertInvertedIndexes(recordList); // 批量插入
- recordList.clear();
-
- latch.countDown();
- }
- }
-
- @Timing("构建 + 保存倒排索引 —— 多线程版本")
- @SneakyThrows
- public void saveInvertedIndexesConcurrent(List<Document> documentList) {
- int batchSize = 10000; // 批量插入时,最多 10000 条
- int groupSize = 50;
- int listSize = documentList.size();
- int times = (int) Math.ceil(listSize * 1.0 / groupSize);
- CountDownLatch latch = new CountDownLatch(times);
-
- for (int i = 0; i < listSize; i += groupSize) {
- int from = i;
- int to = Integer.min(from + groupSize, listSize);
- List<Document> subList = documentList.subList(from, to);
- Runnable task = new InvertedInsertTask(latch, batchSize, subList, mapper);
- executorService.submit(task);
- }
-
- latch.await();
- }
- }
- @Component // 是注册到 Spring 的一个 bean
- @ConfigurationProperties("searcher.indexer")
- @Data // = @Getter + @Setter + @ToString + @EqualsAndHashCode
- public class IndexerProperties {
- // 对应 application.yml 配置下的 searcher.indexer.doc-root-path
- private String docRootPath;
- // 对应 application.yml 配置下的 searcher.indexer.url-prefix
- private String urlPrefix;
- // 对应 application.yml 配置下的 searcher.indexer.index-root-path
- private String indexRootPath;
- }
Document类中包含docID,title,url,content相关的信息;
- @Slf4j
- @Data
- public class Document {
- private Integer docId; // docId 会在正排索引插入后才会赋值
-
- private String title; // 从文件名中解析出来
- private String url; // 依赖两个额外的信息(1. https://docs.oracle.com/javase/8/docs/api/ 2. 相对路径的相对位置)
- private String content; // 从文件中读取出来,并且做一定的处理
- }
- // 针对文档进行分词,并且分别计算每个词的权重
- public Map<String, Integer> segWordAndCalcWeight() {
- // 统计标题中的每个词出现次数 | 分词:标题有哪些词
- List<String> wordInTitle = ToAnalysis.parse(title)
- .getTerms()
- .stream()
- .parallel()
- .map(Term::getName)
- .filter(s -> !ignoredWordSet.contains(s))
- .collect(Collectors.toList());
-
- // 统计标题中,每个词的出现次数 | 统计次数
- Map<String, Integer> titleWordCount = new HashMap<>();
- for (String word : wordInTitle) {
- int count = titleWordCount.getOrDefault(word, 0);
- titleWordCount.put(word, count + 1);
- }
-
- // 统计内容中的词,以及词的出现次数
- List<String> wordInContent = ToAnalysis.parse(content)
- .getTerms()
- .stream()
- .parallel()
- .map(Term::getName)
- .collect(Collectors.toList());
- Map<String, Integer> contentWordCount = new HashMap<>();
- for (String word : wordInContent) {
- int count = contentWordCount.getOrDefault(word, 0);
- contentWordCount.put(word, count + 1);
- }
-
- // 计算权重值
- Map<String, Integer> wordToWeight = new HashMap<>();
- // 先计算出有哪些词,不重复
- Set<String> wordSet = new HashSet<>(wordInTitle);
- wordSet.addAll(wordInContent);
-
- for (String word : wordSet) {
- int titleCount = titleWordCount.getOrDefault(word, 0);
- int contentCount = contentWordCount.getOrDefault(word, 0);
- int weight = titleCount * 10 + contentCount;
-
- wordToWeight.put(word, weight);
- }
-
- return wordToWeight;
- }
创建 forward_indexes 和 inverted_indexes 表
- CREATE SCHEMA `searcher_refactor` DEFAULT CHARACTER SET utf8mb4 ;
-
- CREATE TABLE `searcher_refactor`.`forward_indexes` (
- `docid` INT NOT NULL AUTO_INCREMENT,
- `title` VARCHAR(100) NOT NULL,
- `url` VARCHAR(200) NOT NULL,
- `content` LONGTEXT NOT NULL,
- PRIMARY KEY (`docid`))
- COMMENT = '存放正排索引\ndocid -> 文档的完整信息';
-
- CREATE TABLE `searcher_refactor`.`inverted_indexes` (
- `id` INT NOT NULL AUTO_INCREMENT,
- `word` VARCHAR(100) NOT NULL,
- `docid` INT NOT NULL,
- `weight` INT NOT NULL,
- PRIMARY KEY (`id`))
- COMMENT = '倒排索引\n通过 word -> [ { docid + weight }, { docid + weight }, ... ]';
- // 这个对象映射 inverted_indexes 表中的一条记录(我们不关心表中的 id,就不写 id 了)
- @Data
- public class InvertedRecord {
- private String word;
- private int docId;
- private int weight;
-
- public InvertedRecord(String word, int docId, int weight) {
- this.word = word;
- this.docId = docId;
- this.weight = weight;
- }
- }
index.html
- <!DOCTYPE html>
- <html lang="zh-hans">
- <head>
- <meta charset="UTF-8">
- <meta http-equiv="X-UA-Compatible" content="IE=edge">
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
- <title>神马搜索</title>
- <link rel="stylesheet" href="style.css">
- </head>
- <body>
- <div class="container">
- <i class="fa-brands fa-windows item"></i>
- <div class="search-box">
- <input type="text" class="search-btn" placeholder="搜索">
- </div>
- <i class="fa-solid fa-magnifying-glass item search-submit"></i>
- </div>
- <div class="time-box"></div>
- <div class="poem">
- <p>「世间行乐亦如此,古来万事东流水。」</p>
- <p class="author">—— 《梦游天姥吟留别》</p>
- </div>
-
- <div class="background"></div>
-
- <script src="https://kit.fontawesome.com/44e73cd2d1.js" crossorigin="anonymous"></script>
- <script>
- const search = (query) => {
- window.open('/web?query=' + encodeURIComponent(query), '_blank')
- }
-
- const oSearch = document.querySelector('.search-btn')
- oSearch.addEventListener('focus', () => {oSearch.placeholder = ''})
- oSearch.addEventListener('blur', () => {oSearch.placeholder = '搜索'})
- oSearch.addEventListener('keydown', (event) => {
- if (event.keyCode === 13 && oSearch.value.trim().length !== 0) {
- search(oSearch.value.trim())
- oSearch.value = ''
- oSearch.blur()
- }
- })
-
- document.querySelector('.search-submit').addEventListener('click', () => {
- if (oSearch.value.trim().length !== 0) {
- search(oSearch.value.trim())
- oSearch.value = ''
- }
- })
-
- const oTimeBox = document.querySelector('.time-box')
- const updateTime = () => {
- let now = new Date()
- let hour = now.getHours()
- let minute = now.getMinutes()
- if (hour < 10) {
- hour = '0' + hour
- }
- if (minute < 10) {
- minute = '0' + minute
- }
-
- oTimeBox.textContent = `${hour}:${minute}`
-
- let second = now.getSeconds()
- let r = 60 - second
- setTimeout(updateTime, r * 1000)
- }
- updateTime()
- </script>
- </body>
- </html>
search.html
- <!DOCTYPE html>
- <html lang="zh-hans" xmlns:th="https://www.thymeleaf.org">
- <head>
- <meta charset="UTF-8">
- <title th:text="${query} + ' - 神马搜索'"></title>
- <link rel="stylesheet" href="/query.css">
- </head>
- <body>
- <!-- th:xxx 是 thymeleaf 的语法 -->
- <!-- <div th:text="'你好 ' + ${name} + ' 世界'"></div>-->
- <div class="header">
- <div class="brand"><a href="/">神马搜索</a></div>
- <form class="input-shell" method="get" action="/web">
- <input type="text" name="query" th:value="${query}">
- <button>神马搜索</button>
- </form>
- </div>
-
- <div class="result">
- <!-- th:utext 和 th:text 的区别:要不要进行 HTML 转义 -->
- <!-- <div th:text="'<span>你好 th:text</span>'"></div>-->
- <!-- <div th:utext="'<span>你好 th:utext</span>'"></div>-->
-
- <div class="result-item" th:each="doc : ${docList}">
- <a th:href="${doc.url}" th:text="${doc.title}"></a>
- <div class="desc" th:utext="${doc.desc}"></div>
- <div class="url" th:text="${doc.url}"></div>
- </div>
- </div>
-
- <!-- <div class="result">-->
- <!-- <div th:each="item : ${testList}">-->
- <!-- <span th:text="${item}"></span>-->
- <!-- </div>-->
- <!-- </div>-->
-
- <!-- 一直上一页可能走到 page <= 0 的情况 -->
- <!-- 一直下一页可能走到 page > 上限的情况 -->
- <div class="pagination">
- <a th:href="'/web?query=' + ${query} + '&page=' + ${page - 1}">上一页</a>
- <a th:href="'/web?query=' + ${query} + '&page=' + ${page + 1}">下一页</a>
- </div>
- </body>
- </html>
query.css
- * {
- margin: 0;
- padding: 0;
- box-sizing: border-box;
- }
-
- .header {
- width: 100%;
- height: 80px;
- position: fixed; /* 固定不动 */
- left: 0;
- top: 0;
- background-color: #eee;
- border-bottom: 1px solid #ccc;
- padding-left: 120px;
-
- display: flex;
- align-items: center;
- }
-
- .brand {
- margin-right: 120px;
- }
-
- .brand a {
- color: inherit;
- text-decoration: none;
- }
-
- .input-shell {
- width: 800px;
- height: 52px;
- border: 1px solid #aaa;
- border-radius: 4px;
-
- display: flex;
- align-items: stretch;
- justify-content: space-between;
- }
-
- .input-shell:focus, /* :focus : 该元素 获得焦点 */
- .input-shell:hover { /* :hover : 鼠标滑过该元素 */
- border: 1px solid #888;
- }
-
- .input-shell input {
- border: none;
- outline: none;
- width: 600px;
- padding-left: 8px;
- font-size: 22px;
- }
-
- .input-shell button {
- border: none;
- outline: none;
- width: 200px;
- border-left: 1px solid #ccc;
- }
-
- .result {
- margin-top: 88px;
- width: 100%;
- padding-left: 120px;
- }
-
- .result-item {
- display: flex;
- flex-direction: column;
- margin-bottom: 20px;
- align-items: start;
- }
-
- .result-item a {
- font-size: 22px;
- font-weight: 700;
- color: rgb(42, 107, 205);
- }
-
- .result-item .desc {
- font-size: 18px;
- }
-
- .result-item .url {
- font-size: 18px;
- color: rgb(0, 128, 0);
- }
-
- .result-item .desc i {
- color: red;
- font-style: normal;
- }
-
- .pagination {
- display: flex;
- align-items: center;
- justify-content: space-around;
- margin-bottom: 12px;
- }
style.css
- * {
- margin: 0;
- padding: 0;
- box-sizing: border-box;
- }
-
- body {
- width: 100vw;
- height: 100vh;
-
- display: flex;
- align-items: center;
- justify-content: center;
-
- position: relative;
- overflow: hidden;
- }
-
- .container {
- z-index: 1;
-
- height: 60px;
- background-color: rgba(255, 255, 255, .7);
- padding: 0 8px;
- border-radius: 30px;
- backdrop-filter: blur(4px);
- box-shadow: 0 0 5px 1px gray;
-
- display: flex;
- align-items: center;
- justify-content: space-around;
- }
-
- .time-box {
- z-index: 1;
- position: absolute;
- background-color: transparent;
- height: 40px;
- top: 40%;
- line-height: 40px;
- font-size: 40px;
- text-align: center;
- color: #fff;
- text-shadow: 0 0 4px #000;
- }
-
- .search-box {
- width: 200px;
- transition: all .3s ease-in-out;
- }
-
- .container:hover .search-box,
- .container:focus-within .search-box {
- width: 440px;
- }
-
- .container .item {
- margin: auto 20px;
- font-size: 20px;
- opacity: 0;
- transition-delay: .3s;
- transition: all .3s ease;
- }
-
- .container:focus-within .item {
- opacity: 1;
- }
-
- .container .search-submit {
- display: inline-block;
- height: 40px;
- width: 40px;
- text-align: center;
- line-height: 40px;
- border-radius: 50%;
- cursor: pointer;
- }
-
- .container .search-submit:hover {
- background-color: rgba(255, 255, 255, .6);
- }
-
- .container .search-btn {
- width: 100%;
- border: none;
- outline: none;
- text-align: center;
- background: inherit;
- font-size: 20px;
- transition: all .5s ease-in-out;
- }
-
- .container .search-btn::placeholder {
- color: rgba(230, 230, 230, .9);
- text-shadow: 0 0 4px #000;
- transition: all .2s ease-in-out;
- }
-
- .container:hover .search-btn::placeholder,
- .container:focus-within .search-btn::placeholder {
- color: rgba(119, 119, 119, .9);
- text-shadow: 0 0 4px #f3f3f3;
- }
-
- .background {
- position: absolute;
- top: 0;
- right: 0;
- bottom: 0;
- left: 0;
-
- background-image: url(./bg.jpg);
- background-repeat: no-repeat;
- background-size: cover;
- background-position: center;
- object-fit: cover;
-
- transition: all .2s ease-in-out;
- }
-
- .container:focus-within ~ .background {
- filter: blur(20px);
- transform: scale(1.2);
- }
-
- .poem {
- z-index: 1;
- position: absolute;
- top: 70%;
- color: #ddd;
- text-shadow: 0 0 2px #000;
- opacity: 0;
- transition: all .2s ease-in-out;
- padding: 12px 32px;
- border-radius: 8px;
- line-height: 2;
- }
-
- .poem .author {
- opacity: 0;
- text-align: center;
- transition: all .2s ease-in-out;
- }
-
- .container:focus-within ~ .poem {
- opacity: 1;
- }
-
- .container:focus-within ~ .poem:hover {
- background-color: rgba(255, 255, 255, .3);
- opacity: 1;
- }
-
- .container:focus-within ~ .poem:hover .author {
- opacity: 1;
- }
- @Data
- public class Document {
- private Integer docId;
- private String title;
- private String url;
- private String content;
- private String desc;
-
- @Override
- public String toString() {
- return String.format("Document{docId=%d, title=%s, url=%s}", docId, title, url);
- }
- }
- @Slf4j
- @Component
- public class DescBuilder {
- public Document build(List<String> queryList, Document doc) {
- // 找到 content 中包含关键字的位置
- // query = "list"
- // content = "..... hello list go come do ...."
- // desc = "hello <i>list</i> go com..."
-
- String content = doc.getContent().toLowerCase();
- String word = "";
- int i = -1;
- for (String query : queryList) {
- i = content.indexOf(query);
- if (i != -1) {
- word = query;
- break;
- }
- }
- if (i == -1) {
- // 这里中情况如果出现了,说明咱的倒排索引建立的有问题
- log.error("docId = {} 中不包含 {}", doc.getDocId(), queryList);
- throw new RuntimeException();
- }
-
- // 前面截 120 个字,后边截 120 个字
- int from = i - 120;
- if (from < 0) {
- // 说明前面不够 120 个字了
- from = 0;
- }
-
- int to = i + 120;
- if (to > content.length()) {
- // 说明后面不够 120 个字了
- to = content.length();
- }
-
- String desc = content.substring(from, to);
-
- desc = desc.replace(word, "<i>" + word + "</i>");
-
- doc.setDesc(desc);
-
- return doc;
- }
- }
神马搜索
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。