赞
踩
1.这个笔记仅仅针对ElasticSearch 6.8版本。
2.脚本代码如果遇到执行报错问题,可以copy到文本编辑工具里面去看看空格编码是否正常;
例如下图是Edit with Notepad++下面的情况:
3. Elasticsearch: 权威指南 (虽然本书基于 Elasticsearch 2.x 版本,有些内容可能已经过时;但是有些东西还是具有参考意义的)
一些API的使用规则,例如多索引通配,排除等可以参考下这个:elasticsearch api约定 API规范及约定
官方参考: X-Pack APIs
官方参考:Security APIs
如果有修改密码需求,则可以通过api进行密码修改:
curl -XPUT -H 'Content-Type: application/json' -H "Authorization: Basic 你的user:password的Base64编码" 'http://localhost:9200/_xpack/security/user/elastic/_password' -d'{"password":"new_password"}'
curl -XPUT -H 'Content-Type: application/json' -u elastic:123456 'http://localhost:9200/_xpack/security/user/elastic/_password' -d'{"password":"new_password"}'
官方参考:Cluster APIs
官方参考:Cluster Health
示例:
curl -H "Authorization: Basic 你的user:password的Base64编码" -X GET "http://localhost:9200/_cluster/health?pretty" -s
官方参考:
官方参考:cat nodes
示例:
curl -H "Authorization: Basic 你的user:password的Base64编码" -X GET "http://localhost:9200/_cat/nodes?v&h=id,ip,port,n,hc,hm,rc,rm,cs,fm,qcm,rcm,sqto,sm&pretty"
查看段segment内存:
curl -H "Authorization: Basic 你的user:password的Base64编码" -X GET "http://localhost:9200/_cat/nodes?v&h=name,port,sm"
官方参考:cat allocation
示例:
第3列是已使用空间,第5列是总空间
curl -X GET "http://localhost:9200/_cat/allocation?v" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json'
curl -X GET "http://localhost:9200/_cat/allocation" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json'
官方参考:cat thread pool
示例:
查看写队列
curl -H "Authorization: Basic 你的user:password的Base64编码" -X GET "http://localhost:9200/_cat/thread_pool/write?v&h=id,name,threads,queue,active,rejected,largest,completed&pretty"
查看搜索队列
curl -H "Authorization: Basic 你的user:password的Base64编码" -X GET "http://localhost:9200/_cat/thread_pool/search?v&h=id,name,threads,queue,active,rejected,largest,completed&pretty"
官方参考:Document APIs
官方参考:Index API
还可以设置是否自动创建索引(如果auto_create_index为true,则代表在插入数据时,如果不存在所以,会根据插入数据自动创建对应的索引)。
“index”: “false” 这个字段禁用倒排索引,使它不能被正常搜索;
“doc_values”:false 这个字段不能被用于聚合、排序以及脚本操作;
对于一些可能不需要做任何操作的字段,比如仅用来展示;当然也可以选择只做某一种操作;就可以做以上这样的设置,可以节省磁盘空间 或者 提升索引的速度;
可以参考: 深入理解 Doc Values
示例(这个带了多个自定义分词器,且车牌默认使用了my_analyzer分词器<按照最小1个词,最大10个词进行分词>,这里可以使用空间换时间的模式来提升模糊搜索的性能):
curl -H "Content-Type: application/json" -H "Authorization: Basic 你的user:password的Base64编码" -XPUT "http://localhost:9200/test_index" -d' { "settings": { "index": { "number_of_shards" : 6 }, "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" }, "my_analyzer2": { "tokenizer": "my_tokenizer2" }, "my_analyzer3": { "tokenizer": "my_tokenizer3" }, "my_analyzer4": { "tokenizer": "my_tokenizer4" }, "my_analyzer5": { "tokenizer": "my_tokenizer5" }, "my_analyzer6": { "tokenizer": "my_tokenizer6" }, "my_analyzer7": { "tokenizer": "my_tokenizer7" }, "my_analyzer8": { "tokenizer": "my_tokenizer8" }, "my_analyzer9": { "tokenizer": "my_tokenizer9" }, "my_analyzer10": { "tokenizer": "my_tokenizer10" } }, "tokenizer": { "my_tokenizer": { "type": "ngram", "min_gram": 1, "max_gram": 10, "token_chars": [ "letter", "digit" ] }, "my_tokenizer2": { "type": "ngram", "min_gram": 2, "max_gram": 2, "token_chars": [ "letter", "digit" ] }, "my_tokenizer3": { "type": "ngram", "min_gram": 3, "max_gram": 3, "token_chars": [ "letter", "digit" ] }, "my_tokenizer4": { "type": "ngram", "min_gram": 4, "max_gram": 4, "token_chars": [ "letter", "digit" ] }, "my_tokenizer5": { "type": "ngram", "min_gram": 5, "max_gram": 5, "token_chars": [ "letter", "digit" ] }, "my_tokenizer6": { "type": "ngram", "min_gram": 6, "max_gram": 6, "token_chars": [ "letter", "digit" ] }, "my_tokenizer7": { "type": "ngram", "min_gram": 7, "max_gram": 7, "token_chars": [ "letter", "digit" ] }, "my_tokenizer8": { "type": "ngram", "min_gram": 8, "max_gram": 8, "token_chars": [ "letter", "digit" ] }, "my_tokenizer9": { "type": "ngram", "min_gram": 9, "max_gram": 9, "token_chars": [ "letter", "digit" ] }, "my_tokenizer10": { "type": "ngram", "min_gram": 10, "max_gram": 10, "token_chars": [ "letter", "digit" ] } } } }, "mappings": { "data": { "properties": { "myid": { "type": "keyword", "index": "true" }, "infokind": { "type": "integer", "index": "true" }, "subimagelist" : { "type" : "keyword","index": "false","doc_values":false}, "plateno" : { "type": "text", "index": "true", "analyzer": "my_analyzer", "fields": { "keyword": { "type": "keyword" } } }, "speed": { "type": "double", "index": "true" }, "inserttime": { "type": "date", "index": "true" } } } } }'
1.es的索引创建后,可以新增字段,但是没法对已有的字段进行修改和删除。
2.在默认情况下,Mapping的动态映射Dynamic=true,会自动推测字段的类型并添加到Mapping中;
如果是新增加的字段,根据 Dynamic 的设置分为以下三种状况:
a.当 Dynamic=true 时,一旦有新增字段的文档写入,Mapping 也同时被更新。
b. 当 Dynamic=false 时,索引的 Mapping 是不会被更新的,新增字段的数据无法被索引,也就是无法 被搜索,但是信息会出现在 _source 中。
c.当 Dynamic=strict 时,文档写入会失败。
3.如果需要对索引现有字段进行变更,则只能进行reindex重构索引的方式;即先根据需要构建一个正确字段的目标索引,然后将现有索引reindex到目标索引,然后将现有索引删除,然后将目标索引的别名设置为现有索引名称;
在索引test_index中新增一个keyword类型的字段new_field
curl -X PUT "http://localhost:9200/test_index/data/_mapping" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"properties": {
"new_field":{
"type":"keyword",
"index": "true"
}
}
}'
ES默认的max_result_window位10000,如果查询时,size设置大于1w,则会报错:
Result window is too large
这是可以根据情况适当调大max_result_window的值
curl -XPUT -H "Content-Type: application/json" -H "Authorization: Basic 你的user:password的Base64编码" "http://localhost:9200/test_index/_settings" -d'{"index.max_result_window" :"51200"}'
Index Aliases
1.es中可以对索引进行别名设置;
2.es中一个别名可以关联多个索引;这种对于需要一次查询多个索引的方式比较好用;且可以依次增量进行关联;对于一些按维度拆分索引后,某些逻辑又需要合并查询的情况比较好用;
curl -X POST "localhost:9200/_aliases?pretty" -H "Authorization: Basic 账号:密码base64值" -H 'Content-Type: application/json' -d'
{
"actions" : [
{ "add" : { "index" : "test1", "alias" : "alias1" } }
]
}
'
删除test_index索引
curl -XDELETE -H "Content-Type: application/json" -H "Authorization: Basic 你的user:password的Base64编码" "http://localhost:9200/test_index"
官方参考:Bulk API
示例:
curl -X POST "http://localhost:9200/test_index/_bulk?pretty" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{"index":{"_index":"test_index","_type":"data","_id":"33"}}
{"inserttime":12603864000,"subimagelist":"dfssdfdsfasdfdsfasfsfdfsafasfds","myid":"33_test","plateno":"湘JAS905","infokind":12345}
'
官方参考:
Update API
Update By Query API
Scripting
示例:
test_index是索引名称对应创建索引的"_index"字段;
data是类型type对应创建索引的"_type"字段;
33 是id 对应创建索引的 "_id"字段;
doc是固定写法, 修改仅支持 script 和 doc;
切记一定要带_update;
将id为33的myid字段的值修改为 “修改后的值” ,其他字段的值不变;
方式1:
curl -X POST "http://localhost:9200/test_index/data/33/_update" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"doc":{"myid": "修改后的值"}
}
'
方式2:
curl -X POST "http://localhost:9200/test_index/data/33/_update" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.infokind = 123"
}
}
'
方式3:(该模式可以批量更新多个)
curl -X POST "http://localhost:9200/_bulk?pretty" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{"update":{"_index":"test_index","_type":"data","_id":"33"}}
{"doc":{"infokind":"3"}}
{"update":{"_index":"test_index","_type":"data","_id":"22"}}
{"doc":{"infokind":"2"}}
'
示例:
将id对应的数据修改成只有某些字段和值:
执行后,id为33对应的数据,将只剩下plateno和infokind两个字段了,其他字段都没了
curl -X PUT "http://localhost:9200/test_index/data/33" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"plateno": "湘JAS906",
"infokind": 1234
}
'
根据查询条件批量修改字段的值:
curl -X POST "http://localhost:9200/test_index/data/_update_by_query" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"myid": "11_test"
}
},
"script": {
"source": "ctx._source.infokind = 123"
}
}
'
将索引的某字段的值全部批量修改成某值:
curl -X POST "http://localhost:9200/test_index/data/_update_by_query" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.infokind = 321"
}
}
'
替换字段中的部分字符串:
1.如果某条记录该字段的值为空,则需要判空,否则会报错;
2.只能替换字符串,且条件中一定要转义引号;
curl -X POST "http://localhost:9200/test_index/data/_update_by_query" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.plateno = ctx._source.plateno.replace(\"湘\",\"赣\")"
}
}
'
带条件的替换部分字符串:
curl -X POST "http://localhost:9200/test_index/data/_update_by_query" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.plateno = ctx._source.plateno==null ? ctx._source.plateno = \"湘AAAAAA\" : ctx._source.plateno.replace(\"赣\",\"湘\") "
}
}
'
官方参考:
Delete API
Delete By Query API
注意除了根据索引删除,其他删除只是把数据标记为删除,并未真正删除并释放磁盘空间;如果删除时磁盘空间占用反而上升了,说明正在进行segment合并,导致磁盘io和空间占用增加;
一些其他人的删除参考:Elasticsearch删除数据之_delete_by_query
这里需要注意的是没法和filter配合使用,只能和query组合使用,示例:
curl -X POST "http://localhost:9200/test_index/_delete_by_query" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"myid": "11"
}
}
}
'
curl -X POST "http://localhost:9200/test_index/_delete_by_query" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"myid": "11"
}
}
}
'
删除status为0,且createtime大于1549036701000,且myid不等于1的数据;
curl -X POST "http://localhost:9200/test_index/_delete_by_query" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must_not": [ { "term": { "myid": "1" } } ], "must": [ { "term": { "status": "0" } }, { "range": { "createtime": { "gt": "1549036701000" } } } ] } } } '
根据时间条件大批量删除数据
参数说明:
wait_for_completion=false是使用异步任务的方式,不会等删除后才返回,而是返回一个taskid,然后根据tasks/task/${taskId}接口可以查询任务状态,并且还可以根据任务id进行一些取消任务等操作(POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel);任务完成后任务就会被删除。
scroll_size=3000表示每次滚动批量删除3000条数据;
slices=5表示将任务切分成5个并发处理;
conflicts=proceed表示忽略版本冲突;
curl -X POST "http://localhost:9200/*_index/_delete_by_query?wait_for_completion=false&conflicts=proceed&scroll_size=3000&slices=5" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"query": {
"range" : {
"mytime" : {"lte":1678441960000}
}
}
}
'
官方参考:Get API
官方参考:Multi Get API
官方参考:Reindex API
该接口可以实现以下能力:
1.对于快速替换字段类型之类的很好用。
2.迁移索引数据(可以根据查询条件迁移符合条件的数据,也可以跨服务器跨集群迁移);
示例:
跨集群迁移索引数据:
其中source部分为迁移源部分;dest为目标部分内容;
remote为源集群的ip,用户名,密码等信息;
备注:
不过需要注意的是,如果是跨集群进行数据迁移,需要将源集群的ip加到本集群的白名单中;否则迁移时会报如下错误信息:
[127.1.1.127:9200] not whitelisted in reindex.remote.whitelist
添加白名单方法:
在config/elasticsearch.yml配置文件中添加如下项(ip仅做示例,需要修改为实际的ip)
reindex.remote.whitelist: 127.1.1.127:9200
curl -X POST "http://localhost:9200/_reindex?pretty" -H "Authorization: ${auth}" -H 'Content-Type: application/json' -d'
{
"source": {
"remote": {
"host": "http://'${ip}':9200",
"username": "'${username}'",
"password": "'${pwd}'"
},
"index": "test_result"
},
"dest": {
"index": "test_result"
}
}
'
官方参考:Search APIs
query和filter的区别:query有评分,filter无评分;所以filter速度会更快更节省资源;
可以参考:Query查询和Filter查询
这里还需要注意下term 和match的区别:term为精确查找,不会分词这些东东,match会进行分词查找等; 可以参考:es中的term和match的区别
match query示例:
curl -X GET "http://localhost:9200/test_index/_search" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"size": 10,
"query": {
"match" : {
"myid" : "33_test"
}
}
}
'
搜索示例:因为车牌按照1-10词分词了,则假设只有湘J的数据但是搜索湘C也会出现结果,只是分值比较低,如下图所示:
curl -X GET "http://localhost:9200/test_index/_search?pretty" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"plateno": "湘C"
}
}
}
'
如果想要达到完全匹配才查找到结果,则可以使用分词搜索,如下示例,使用自定义分词器2进行搜索,必须2个词匹配才能找到结果,则搜索湘C是无法找到结果的,但是湘J就可以找到结果:
curl -X GET "http://localhost:9200/test_index/_search?pretty" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"plateno": {"analyzer":"my_analyzer2","query":"湘C"}
}
}
}
'
根据正则匹配查询(可用于模糊搜索),并且返回指定字段,并且返回前500条记录,并且排序:
curl -X GET "http://localhost:9200/test_index/_search?pretty" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"size":500,
"_source":["myid","plateno"],
"query": {"bool":{"must":[{"regexp":{"myid":".*t.*"}}]}},
"sort":{"myid":{"order":"asc"}}
}
'
查询myid等于1,和2,且status为空的数据;这里需要注意,当匹配多个值(数组形式)时,需要使用terms;这样就类似mysql中的in查询;not in可以用must_not+terms来实现;
curl -X POST "http://localhost:9200/test_index/_search" -H "Authorization: Basic ZWxhc3RpYzoxMjM0NTY=" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{"terms": {"myid": ["1", "2"]}}
],
"must_not": [
{"exists": {"field": "status"}}
]
}
}
}
'
filter查询,不进行打分,filter是没法使用分词,也没法配合match等使用的;只能配合使用term,正则等。
filter配合正则使用:
curl -X GET "http://localhost:9200/test_index/_search?pretty" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"size":500,
"post_filter": {"bool":{"must":[{"regexp":{"myid":".*t.*"}}]}}
}
'
filter配合term使用:
curl -X GET "http://localhost:9200/test_index/_search" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d'
{
"size": 10,
"post_filter": {
"term" : {
"myid" : "33_test"
}
}
}
'
如果仅仅只是想统计下数据,则可以使用count api来进行统计,将会更加高效;
官方参考:添加链接描述
如果是获取某个索引的总数,则用下面的接口会非常高效,而且还可以多个索引一起统计总数,返回的是多个索引总数之和;
示例:
curl -X GET "http://localhost:9200/test_index/_count" -H "Authorization: Basic 你的user:password的Base64编码"
curl -X GET "http://localhost:9200/test_index,test1_index1/_count" -H "Authorization: Basic 你的user:password的Base64编码"
如果是想根据条件进行统计,则可以如下操作:
curl -X GET "http://localhost:9200/test_index/_count" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d' { "query": { "term" : { "myid" : "1" } } } ' #或者也可以这样 curl -X GET "http://localhost:9200/test_index/_count?q=myid:1" -H "Authorization: Basic 你的user:password的Base64编码" #根据时间条件统计 curl -X GET "http://localhost:9200/*_index/_count" -H "Authorization: Basic 你的user:password的Base64编码" -H 'Content-Type: application/json' -d' { "query": { "range" : { "mytime" : {"lte":1678441960000} } } } ' #shell处理 cntUrl="http://localhost:9200/*_index/_count" cntPara="{\"query\":{\"range\":{\"mytime\":{\"lte\":$etms}}}}" cntRes=`curl ${cntUrl} -X POST -H "Authorization: Basic 你的user:password的Base64编码" -H "Content-Type: application/json" -d ${cntPara} 2>>$log`
根据my_field字段分组统计查询_time时间范围内的数目;该方式无法统计出my_field=Null值的数目;
curl -H "Authorization: Basic 账号:密码base64值" -H 'Content-Type: application/json' -X GET "http://localhost:9200/my_result/_search" -d'
{
"query":{"bool":{"must":[{"range":{"_time":{"gte":1632931200000,"lte":1633017599000}}}]}},
"size":0,
"aggs": {
"aggs_values": {
"terms": {
"field": "my_field"
}
}
}
}'
根据my_field字段分组统计查询_time时间范围内的数目;并对数目进行排序后返回;
curl -H "Authorization: Basic 账号:密码base64值" -H 'Content-Type: application/json' -X GET "http://localhost:9200/my_result/_search" -d'
{
"query":{"bool":{"must":[{"range":{"_time":{"gte":1632931200000,"lte":1633017599000}}}]}},
"size":0,
"aggs": {
"aggs_values": {
"terms": {
"field": "my_field",
"order": {"_count": "asc"}
}
}
}
}'
根据my_field字段分组统计查询_time时间范围内的数目;并对数目进行排序后返回3组记录;
curl -H "Authorization: Basic 账号:密码base64值" -H 'Content-Type: application/json' -X GET "http://localhost:9200/my_result/_search" -d'
{
"query":{"bool":{"must":[{"range":{"_time":{"gte":1632931200000,"lte":1633017599000}}}]}},
"size":0,
"aggs": {
"aggs_values": {
"terms": {
"field": "my_field",
"order": {"_count": "asc"},
"size":3
}
}
}
}'
根据my_field字段分组统计查询_time时间范围内的数目;并对数目进行排序,并且每组数据还返回一条文档记录;
curl -H "Authorization: Basic 账号:密码base64值" -H 'Content-Type: application/json' -X GET "http://localhost:9200/my_result/_search" -d' { "query":{"bool":{"must":[{"range":{"_time":{"gte":1632931200000,"lte":1633017599000}}}]}}, "size":0, "aggs": { "aggs_values": { "terms": { "field": "my_field", "order": {"_count": "asc"} } "aggs": { "top": { "top_hits": { "size": 1 } } } } } }'
根据my_field字段分组统计查询_time时间范围内的数目;并对数目进行排序,并且每组数据中按照某字段排序后返回一条文档记录;
curl -H "Authorization: Basic 账号:密码base64值" -H 'Content-Type: application/json' -X GET "http://localhost:9200/my_result/_search" -d' { "query":{"bool":{"must":[{"range":{"_time":{"gte":1632931200000,"lte":1633017599000}}}]}}, "size":0, "aggs": { "aggs_values": { "terms": { "field": "my_field", "order": {"_count": "asc"} } "aggs": { "top": { "top_hits": { "size": 1, "sort":{"test_field":"desc"} } } } } } }'
官方参考:
6.8版本
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。