当前位置:   article > 正文

es实现html页面搜索,中间件系列之ElasticSearch-4-ES小规模实战即仿京东搜索

es 存入html页面内容

准备数据

利用Jsoup爬取数据, Jsoup是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据。详情可参考:Jsoup中文使用手册

以京东搜索的页面为例,检查网页源代码,可以发现,信息设置在如下div中

400cdb1ba6f2c1de3df499201356707d.png

代码如下:

@Data

@AllArgsConstructor

@NoArgsConstructor

public class JdContent{

private String img;

private String price;

private String title;

}

复制代码

@Component

public class HtmlParseUtil{

//京东搜索关键词Java的API

private final String url = "https://search.jd.com/Search?keyword=";

public List parseJD(String keyword) throws IOException{

//解析网页,返回的Document就是JS页面对象

Document docunment = Jsoup.parse(new URL(url+keyword), 3000);

//获取需要的标签ID

Element element = docunment.getElementById("J_goodsList");

//获取所有的li元素

Elements elements = element.getElementsByTag("li");

List list = new ArrayList();

for (Element el : elements) {

String img = el.getElementsByTag("img").eq(0).attr("src");

String price = el.getElementsByClass("p-price").eq(0).text();

String title = el.getElementsByClass("p-name").eq(0).text();

list.add(new JdContent(img,price,title));

}

return list;

}

}

复制代码

但是需要注意,比如受限于网速,图片也有可能会获取不到,为了提高访问速度,对于图片一般使用懒加载,再次观察网页源代码,可以看到img标签含有source-data-lazy-img属性,可以通过它来访问

a3ff257c88ff6646aded8b144bb576eb.png

String img = el.getElementsByTag("img").eq(0).attr("source-data-lazy-img");

复制代码

业务编写

插入数据

像已经建立的jd索引中插入数据

@Autowired

private RestHighLevelClient restHighLevelClient;

@Autowired

private HtmlParseUtil htmlParseUtil;

public boolean parseContent(String keyword) throws IOException{

List jdContents = htmlParseUtil.parseJD(keyword);

BulkRequest bulkRequest = new BulkRequest();

bulkRequest.timeout("1m");

for (JdContent jdContent : jdContents) {

bulkRequest.add(new IndexRequest("jd").source(JSON.toJSONString(jdContent), XContentType.JSON));

}

BulkResponse bulkResponse= restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);

return !bulkResponse.hasFailures();

}

复制代码

提供查询

public List> search(String keyword, int pageNo, int pageSize) throws IOException {

if (pageNo<1){

pageNo = 1;

}

//条件搜索

SearchRequest searchRequest = new SearchRequest("jd");

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

//分页

sourceBuilder.from(pageNo);

sourceBuilder.size(pageSize);

//匹配数据

MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("title", keyword);

sourceBuilder.query(matchQueryBuilder);

sourceBuilder.timeout(new TimeValue(50, TimeUnit.SECONDS));

//执行搜索

searchRequest.source(sourceBuilder);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

//解析结果

ArrayList> list = new ArrayList>();

for (SearchHit hit : searchResponse.getHits().getHits()) {

list.add(hit.getSourceAsMap());

}

return list;

}

复制代码

前端交互

el:'#app',

data:{

keyword:'',//搜索的关键字

results:[]搜索的结果

},

methods: {

searchKey() {

var keyword = this.keyword;

// console.log(keyword);

axios.get('search/'+keyword+"/1/10").then(response=>{

// console.log(response);

this.results = response.data;

})

}

}

})

复制代码

其中,app是标签ID,keyword为输入栏绑定的model名称,searchKey触发事件

搜索

复制代码

遍历返回值即可

¥{{result.price}}

{{result.title}}

复制代码

关键字高亮

与普通查询大体逻辑相同,只需要设置自定义的高亮逻辑,并在ES的返回值中用高亮的内容替换原内容即可。

public List> searchHignLight(String keyword, int pageNo, int pageSize) throws IOException {

if (pageNo<1){

pageNo = 1;

}

//条件搜索

SearchRequest searchRequest = new SearchRequest("jd");

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

//分页

sourceBuilder.from(pageNo);

sourceBuilder.size(pageSize);

//匹配数据

MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("title", keyword);

sourceBuilder.query(matchQueryBuilder);

sourceBuilder.timeout(new TimeValue(50, TimeUnit.SECONDS));

//高亮

HighlightBuilder highlightBuilder = new HighlightBuilder();

highlightBuilder.field("title");

highlightBuilder.preTags("");

highlightBuilder.postTags("");

//设置关键字高亮一次

highlightBuilder.requireFieldMatch(false);

sourceBuilder.highlighter(highlightBuilder);

//执行搜索

searchRequest.source(sourceBuilder);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

//解析结果

ArrayList> list = new ArrayList>();

for (SearchHit hit : searchResponse.getHits().getHits()) {

//解析高亮

Map highlightFields = hit.getHighlightFields();

HighlightField title = highlightFields.get("title");

//将原来的字段替换为高亮的字段设置

Map sourceAsMap = hit.getSourceAsMap();

if (title!=null){

Text[] fragments = title.fragments();

String temValue = "";

for (Text fragment : fragments) {

temValue+=fragment;

}

sourceAsMap.put("title",temValue);//替换

}

list.add(sourceAsMap);

}

return list;

}

复制代码

前端页面解析返回的HTML即可

复制代码

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/499055
推荐阅读
相关标签
  

闽ICP备14008679号