赞
踩
1.下载拼音分词插件,要和安装的es版本保持一致,我的版本是7.9.3
插件源码地址:https://github.com/medcl/elasticsearch-analysis-pinyin
但是找不到相应的releases版本
只有自己下载7.9.3 code
2.下载完成后,用maven进行打包,mvn clean package 进行打包,在releases中会生成zip包
生产的releases zip包发现版本是7.7的
elasticsearch-analysis-pinyin-7.7.0.zip
3.解压改名乘pinyin放入到 es的plugins下,重启es,还是提示版本出错
于是修改plugin-descriptor.properties
version=7.9.3
elasticsearch.version=7.9.3
重启es 正常运行
测试
创建index:
PUT /medcl/ { "settings": { "index": { "analysis": { "analyzer": { "pinyin_analyzer": { "tokenizer": "my_pinyin" } }, "tokenizer": { "my_pinyin": { "type": "pinyin", "keep_separate_first_letter": false, "keep_full_pinyin": true, "keep_original": true, "limit_first_letter_length": 16, "lowercase": true, "remove_duplicated_term": true } } } } } }
参数说明:
keep_first_letter:启用此选项时,例如:刘德华> ldh,默认值:true
keep_separate_first_letter:启用该选项时,将保留第一个字母分开,例如:刘德华> l,d,h,默认:
假的,注意:查询结果也许是太模糊,由于长期过频
keep_full_pinyin:当启用该选项,例如:刘德华> [ liu,de,hua],默认值:true
keep_original:当启用此选项时,也会保留原始输入,默认值:false
limit_first_letter_length:设置first_letter结果的最大长度,默认值:16
lowercase:小写非中文字母,默认值:true
remove_duplicated_term:当启用此选项时,将删除重复项以保存索引,例如:de的> de,默认值:
false,注意:位置相关查询可能受影响
POST /medcl/_analyze
{
"text": ["刘德华"],
"analyzer": "pinyin_analyzer"
}
POST /medcl/_mapping
{ "properties": { "name": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": false, "term_vector": "with_offsets", "analyzer": "pinyin_analyzer", "boost": 10 } } } } }
POST /medcl/_bulk
{"index":{"_index":"medcl"}}
{"name":"刘德华"}
POST /medcl/_search
{
"query":{
"match": {
"name.pinyin": {
"query": "ldh"
}
}
}
}
结果:
{ took: 5 timed_out: false _shards: { total: 1 successful: 1 skipped: 0 failed: 0 }- hits: { total: { value: 1 relation: "eq" }- max_score: 0.3439677 hits: [1] 0: { _index: "medcl" _type: "_doc" _id: "aNyPaXYBMZ73IDxRFEYg" _score: 0.3439677 _source: { name: "刘德华" }- }- - }- }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。