赞
踩
使用内置分词器
内置的分词器将一句话拆分成一个个字,这种拆法意义不大
使用IK分词器
资源
https://github.com/medcl/elasticsearch-analysis-ik/releases
链接:https://pan.baidu.com/s/1dTzBN6fr1ieks25qDqA26A
提取码:0cc3
在es安装目录下的plugins目录里创建ik目录
mkdir /usr/local/es/elasticsearch-7.2.0/plugins/ik
安装unzip命令
yum -y install unzip
解压
unzip elasticsearch-analysis-ik-7.2.0.zip
重启es即可
ik使用
ik_max_word :会将文本做最细粒度的拆分;尽可能多的拆分出词语
ik_smart:会做最粗粒度的拆分;已被分出的词语将不会再次被其它词语占有
{ -"tokens": [ -{ "token": "中华人民共和国", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, -{ "token": "中华人民", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 1 }, -{ "token": "中华", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 2 }, -{ "token": "华人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 3 }, -{ "token": "人民共和国", "start_offset": 2, "end_offset": 7, "type": "CN_WORD", "position": 4 }, -{ "token": "人民", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 5 }, -{ "token": "共和国", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 6 }, -{ "token": "共和", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 7 }, -{ "token": "国", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 8 }, -{ "token": "国歌", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 9 } ] }
创建索引
{ "settings" : { "analysis" : { "analyzer" : { "ik" : { "tokenizer" : "ik_max_word" } } } }, "mappings" : { "properties" : { "username" : {"type" : "text", "analyzer" : "ik_max_word"} } } }
添加数据
查询
{ -"tokens": [ -{ "token": "你好", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, -{ "token": "我", "start_offset": 3, "end_offset": 4, "type": "CN_CHAR", "position": 1 }, -{ "token": "朴", "start_offset": 4, "end_offset": 5, "type": "CN_CHAR", "position": 2 }, -{ "token": "国", "start_offset": 5, "end_offset": 6, "type": "CN_CHAR", "position": 3 }, -{ "token": "昌", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 4 } ] }
创建 custom 目录
mkdir custom
custom/myext.dic
custom/myext_stopword.dic
vim IKAnalyzer.cfg.xml
重启es
{ -"tokens": [ -{ "token": "你好", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, -{ "token": "史珍香", "start_offset": 2, "end_offset": 5, "type": "CN_WORD", "position": 1 }, -{ "token": "我", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 2 }, -{ "token": "朴国昌", "start_offset": 7, "end_offset": 10, "type": "CN_WORD", "position": 3 } ] }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。