赞
踩
前面将结构化查询讲完了,接下来主要学习的是es的全文检索功能,其实如果说全文检索包含哪些搜索方式的话,
主要就有大概以下几种:
匹配查询(match query)、短语查询(match phrase query)、短语前缀查询(match phrase prefix)、
多字段查询(multi match query)、common terms query、Intervals query、simple query string,
基本就这么多,其实我们前面讲述的query string查询,如果严格区分的话,也算是全文检索中的一个吧。
不过被称为lucene语法查询。后面我们会一一学习这些查询的。
match query即匹配查询,在前面介绍term查询的时候我们曾经提过一嘴,term查询是不分词的,match查询是
分词的。我们先查询一个,然后根据查询的结果进行分析:
GET bank/_search
{
"query": {
"match": {
"firstname": "血肉苦弱,机械飞升"
}
},
"profile": "true"
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"profile" : {
"shards" : [
{
"id" : "[UhzKWPIsSgi8QaaJLHVmFg][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "firstname:血 firstname:肉 firstname:苦 firstname:弱 firstname:机 firstname:械 firstname:飞 firstname:升",
"time_in_nanos" : 177915,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 125337,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 52578
},
"children" : [
{
"type" : "TermQuery",
"description" : "firstname:血",
"time_in_nanos" : 42629,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 40568,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 2061
}
},
{
"type" : "TermQuery",
"description" : "firstname:肉",
"time_in_nanos" : 8407,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 7742,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 665
}
},
{
"type" : "TermQuery",
"description" : "firstname:苦",
"time_in_nanos" : 7289,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 6628,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 661
}
},
{
"type" : "TermQuery",
"description" : "firstname:弱",
"time_in_nanos" : 7141,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 6509,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 632
}
},
{
"type" : "TermQuery",
"description" : "firstname:机",
"time_in_nanos" : 6791,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 6179,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 612
}
},
{
"type" : "TermQuery",
"description" : "firstname:械",
"time_in_nanos" : 6721,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 6113,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 608
}
},
{
"type" : "TermQuery",
"description" : "firstname:飞",
"time_in_nanos" : 6541,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 5947,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 594
}
},
{
"type" : "TermQuery",
"description" : "firstname:升",
"time_in_nanos" : 6719,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 6094,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 625
}
}
]
}
],
"rewrite_time" : 21692,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 2376
}
]
}
],
"aggregations" : [ ]
}
]
}
}
通过返回结果,我们可以分析出来,match底层利用的是term来查询的,首先将“血肉苦弱,机械飞升”按照默认的
分词器进行了分词处理,然后再一个一个的去根据分词器分出来的词进行term搜索,最后将搜索结果返回。
还有一个match_all的用法,这个用法我就不再返回结果,因为查询的的是所有的值:
GET bank/_search
{
"query": {
"match_all": {}
},
"profile": "true"
}
上述其实只是简化的一种match查询,其实match的查询还有许多的其他条件可以使用,我们学习一下经常使用的:
query:即需要搜索的内容
fuzziness:和我们在模糊查询中的意思是一样的,即指的是最大编辑距离。即允许匹配的值与关键字之间最大的
偏差。
operator:操作符,and或者or,默认是or
zero_terms_query:默认值是none,表示如果使用的analyzer是停用词分析器的话,那么就会在索引时,去掉
所有的停用词,如果是all表示查询所有。
GET bank/_search
{
"query": {
"match": {
"firstname":{
"query": "Hattie",
"operator": "and",
"fuzziness":1,
"zero_terms_query": "none"
}
}
},
"profile": "true"
}
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 6.5042877,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 6.5042877,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "983",
"_score" : 5.4202404,
"_source" : {
"account_number" : 983,
"balance" : 47205,
"firstname" : "Mattie",
"lastname" : "Eaton",
"age" : 24,
"gender" : "F",
"address" : "418 Allen Avenue",
"employer" : "Trasola",
"email" : "mattieeaton@trasola.com",
"city" : "Dupuyer",
"state" : "NJ"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "firstname:hattie (firstname:mattie)^0.8333333",
"time_in_nanos" : 122563,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 2,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 2180,
"match" : 863,
"next_doc_count" : 2,
"score_count" : 2,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 7855,
"advance_count" : 1,
"score" : 7241,
"build_scorer_count" : 3,
"create_weight" : 56686,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 47738
},
"children" : [
{
"type" : "TermQuery",
"description" : "firstname:hattie",
"time_in_nanos" : 33019,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 3,
"compute_max_score" : 1634,
"advance" : 398,
"advance_count" : 2,
"score" : 3982,
"build_scorer_count" : 4,
"create_weight" : 13509,
"shallow_advance" : 1249,
"create_weight_count" : 1,
"build_scorer" : 12247
}
},
{
"type" : "BoostQuery",
"description" : "(firstname:mattie)^0.8333333",
"time_in_nanos" : 30033,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 3,
"compute_max_score" : 360,
"advance" : 807,
"advance_count" : 2,
"score" : 595,
"build_scorer_count" : 4,
"create_weight" : 26055,
"shallow_advance" : 323,
"create_weight_count" : 1,
"build_scorer" : 1893
}
}
]
}
],
"rewrite_time" : 852484,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 16098
}
]
}
],
"aggregations" : [ ]
}
]
}
}
match phrase query即短语搜索,与match搜索不同的地方是,短语搜索的搜索结果是匹配对应的短语,
而不是将短语进行分割后存在其中某一条词汇就返回,而是和整条短语都匹配时,才会返回。
先看示例代码:
GET bank/_search
{
"query": {
"match_phrase": {
"address": "Bristol Street"
}
},
"profile": "true"
}
或者
GET bank/_search
{
"query": {
"match_phrase": {
"address": {
"query":"Bristol Street",
"analyzer": "ik_smart"
}
}
},
"profile": "true"
}
以上两种写法返回结果是一样的,
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 7.457467,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 7.457467,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "PhraseQuery",
"description" : "address:\"bristol street\"",
"time_in_nanos" : 1695954,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 1,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 19358,
"match" : 19600,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 51360,
"advance_count" : 1,
"score" : 14776,
"build_scorer_count" : 3,
"create_weight" : 99282,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 1491578
}
}
],
"rewrite_time" : 1718,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 30899
}
]
}
],
"aggregations" : [ ]
}
]
}
}
其搜索原理是:
先根据短语进行分词,然后对分词后单词在feild中进行搜索,将搜索后的结果进一步筛选,找到在同一个field中
的doc,再对doc进行筛选,筛选出与短语顺序一致的doc。返回最终结果。
这里要说明一个配置slop,意思是要经过几次移动才能与一个document的field中的匹配,这个移动的次数,
就是slop,默认是0
即短语前缀匹配,和短语查询类似,我们学习一下用法:
GET bank/_search
{
"query": {
"match_phrase_prefix": {
"address": "Bristol"
}
},
"profile": "true"
}
或者
GET bank/_search
{
"query": {
"match_phrase_prefix": {
"address": {
"query":"Bristol",
"analyzer": "ik_smart"
}
}
},
"profile": "true"
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.5025153,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 6.5025153,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "TermQuery",
"description" : "address:bristol",
"time_in_nanos" : 42851,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 1042,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 1286,
"advance_count" : 1,
"score" : 15944,
"build_scorer_count" : 3,
"create_weight" : 15913,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 8666
}
}
],
"rewrite_time" : 87691,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 43088
}
]
}
],
"aggregations" : [ ]
}
]
}
}
原理:返回包含提供的文本的单词且以相同顺序出现的文档。提供的文本的最后一个分词被视为前缀,匹配
以该分词 开头的任何单词。
即多字段查询,标准查询的基础上,支持多字段查询
GET bank/_search
{
"query": {
"multi_match": {
"query": "Hattie",
"fields": ["firstname", "email"]
}
},
"profile": "true"
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.5042877,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 6.5042877,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "DisjunctionMaxQuery",
"description" : "(firstname:hattie | email:hattie)",
"time_in_nanos" : 133378,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 1354,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 1375,
"advance_count" : 1,
"score" : 25980,
"build_scorer_count" : 3,
"create_weight" : 83277,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 21392
},
"children" : [
{
"type" : "TermQuery",
"description" : "firstname:hattie",
"time_in_nanos" : 91743,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 829,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 729,
"advance_count" : 1,
"score" : 25407,
"build_scorer_count" : 3,
"create_weight" : 51324,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 13454
}
},
{
"type" : "TermQuery",
"description" : "email:hattie",
"time_in_nanos" : 11182,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 10752,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 430
}
}
]
}
],
"rewrite_time" : 3804,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 64512
}
]
}
],
"aggregations" : [ ]
}
]
}
}
同时还支持在fields中使用通配符*和caret(^)进行加权,如下例子所示:
GET bank/_search
{
"query": {
"multi_match": {
"query": "Hattie",
"fields": ["firstname^3", "**email"]
}
},
"profile": "true"
}
上述查询表示的是firstname字段要比**email重要3倍,所以匹配的时候更偏向于firstname
多匹配查询还有一个比较重要的参数type,其中type的类型影响着多匹配查询方式内部的执行状态,type的类型有
如下几种:
best_fields:查询匹配任何字段的文档,也是默认的类型,但是使用最佳匹配字段的_score;
most_fields:查找匹配任何字段的文档,结合每个字段的_score
cross_fields:用相同的分析器处理字段,把这些字段当作一个大字段。查找任何字段的每个单词
phrase:在每个字段上运行短语匹配查询,结合每个字段的_score
phrase_prefix:在每个字段上运行短语前缀匹配查询,结合每个字段的_score
在同一个字段中搜索多个单词的时候此参数最有用,best_fields类型对每个字段生成一个匹配查询并且封装
成dis_max查询,来找到最佳匹配字段,关于dis_max和tie_breaker 的详细讲解可参考
这里我就不在叙述了,不过es7.9使用dis_max查询的时候好像并没有实现传送门中的效果,也许es7.9进行了部分
优化吧,不得而知,有了解的可以告知一下,例子如下:
我这里查询时,并没有
GET bank/_search
{
"query": {
"multi_match": {
"query": "Bates Street",
"type": "best_fields",
"fields": ["lastname", "address"]
}
},
"profile": "true"
}
按照dis_max的查询的原理,查询的结果应该分数都是一致的,首先因为这里我没有设置tie_breaker,
tie_breaker 默认就是0,其次不论在lastname字段还是address字段都没有完全包括Bates Street这个短语,
所以应该分数都是一样的,但是返回结果却是和理论有点差别的。
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 385,
"relation" : "eq"
},
"max_score" : 6.5042877,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "13",
"_score" : 6.5042877,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 0.95495176,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "32",
"_score" : 0.95495176,
"_source" : {
"account_number" : 32,
"balance" : 48086,
"firstname" : "Dillard",
"lastname" : "Mcpherson",
"age" : 34,
"gender" : "F",
"address" : "702 Quentin Street",
"employer" : "Quailcom",
"email" : "dillardmcpherson@quailcom.com",
"city" : "Veguita",
"state" : "IN"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "49",
"_score" : 0.95495176,
"_source" : {
"account_number" : 49,
"balance" : 29104,
"firstname" : "Fulton",
"lastname" : "Holt",
"age" : 23,
"gender" : "F",
"address" : "451 Humboldt Street",
"employer" : "Anocha",
"email" : "fultonholt@anocha.com",
"city" : "Sunriver",
"state" : "RI"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "51",
"_score" : 0.95495176,
"_source" : {
"account_number" : 51,
"balance" : 14097,
"firstname" : "Burton",
"lastname" : "Meyers",
"age" : 31,
"gender" : "F",
"address" : "334 River Street",
"employer" : "Bezal",
"email" : "burtonmeyers@bezal.com",
"city" : "Jacksonburg",
"state" : "MO"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "63",
"_score" : 0.95495176,
"_source" : {
"account_number" : 63,
"balance" : 6077,
"firstname" : "Hughes",
"lastname" : "Owens",
"age" : 30,
"gender" : "F",
"address" : "510 Sedgwick Street",
"employer" : "Valpreal",
"email" : "hughesowens@valpreal.com",
"city" : "Guilford",
"state" : "KS"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "87",
"_score" : 0.95495176,
"_source" : {
"account_number" : 87,
"balance" : 1133,
"firstname" : "Hewitt",
"lastname" : "Kidd",
"age" : 22,
"gender" : "M",
"address" : "446 Halleck Street",
"employer" : "Isologics",
"email" : "hewittkidd@isologics.com",
"city" : "Coalmont",
"state" : "ME"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "107",
"_score" : 0.95495176,
"_source" : {
"account_number" : 107,
"balance" : 48844,
"firstname" : "Randi",
"lastname" : "Rich",
"age" : 28,
"gender" : "M",
"address" : "694 Jefferson Street",
"employer" : "Netplax",
"email" : "randirich@netplax.com",
"city" : "Bellfountain",
"state" : "SC"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "138",
"_score" : 0.95495176,
"_source" : {
"account_number" : 138,
"balance" : 9006,
"firstname" : "Daniel",
"lastname" : "Arnold",
"age" : 39,
"gender" : "F",
"address" : "422 Malbone Street",
"employer" : "Ecstasia",
"email" : "danielarnold@ecstasia.com",
"city" : "Gardiner",
"state" : "MO"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "140",
"_score" : 0.95495176,
"_source" : {
"account_number" : 140,
"balance" : 26696,
"firstname" : "Cotton",
"lastname" : "Christensen",
"age" : 32,
"gender" : "M",
"address" : "878 Schermerhorn Street",
"employer" : "Prowaste",
"email" : "cottonchristensen@prowaste.com",
"city" : "Mayfair",
"state" : "LA"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "DisjunctionMaxQuery",
"description" : "((address:bates address:street) | (lastname:bates lastname:street))",
"time_in_nanos" : 382062,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 85039,
"match" : 0,
"next_doc_count" : 385,
"score_count" : 385,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 17873,
"advance_count" : 1,
"score" : 83484,
"build_scorer_count" : 3,
"create_weight" : 68479,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 127187
},
"children" : [
{
"type" : "BooleanQuery",
"description" : "address:bates address:street",
"time_in_nanos" : 197939,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 7,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 385,
"compute_max_score_count" : 7,
"compute_max_score" : 29714,
"advance" : 31586,
"advance_count" : 386,
"score" : 26437,
"build_scorer_count" : 3,
"create_weight" : 46463,
"shallow_advance" : 7704,
"create_weight_count" : 1,
"build_scorer" : 56035
},
"children" : [
{
"type" : "TermQuery",
"description" : "address:bates",
"time_in_nanos" : 31596,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 30651,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 945
}
},
{
"type" : "TermQuery",
"description" : "address:street",
"time_in_nanos" : 88777,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 7,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 385,
"compute_max_score_count" : 7,
"compute_max_score" : 29349,
"advance" : 18294,
"advance_count" : 386,
"score" : 13448,
"build_scorer_count" : 4,
"create_weight" : 8860,
"shallow_advance" : 7345,
"create_weight_count" : 1,
"build_scorer" : 11481
}
}
]
},
{
"type" : "BooleanQuery",
"description" : "lastname:bates lastname:street",
"time_in_nanos" : 36076,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 4,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 4,
"compute_max_score" : 12292,
"advance" : 160,
"advance_count" : 2,
"score" : 4667,
"build_scorer_count" : 3,
"create_weight" : 13380,
"shallow_advance" : 493,
"create_weight_count" : 1,
"build_scorer" : 5084
},
"children" : [
{
"type" : "TermQuery",
"description" : "lastname:bates",
"time_in_nanos" : 22974,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 4,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 4,
"compute_max_score" : 9812,
"advance" : 87,
"advance_count" : 2,
"score" : 4623,
"build_scorer_count" : 4,
"create_weight" : 6008,
"shallow_advance" : 315,
"create_weight_count" : 1,
"build_scorer" : 2129
}
},
{
"type" : "TermQuery",
"description" : "lastname:street",
"time_in_nanos" : 4158,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 4021,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 137
}
}
]
}
]
}
],
"rewrite_time" : 8783,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 97660
}
]
}
],
"aggregations" : [ ]
}
]
}
}
当查询多字段包含相同文本以不同方式分词的时候此参数最有用,
GET bank/_search
{
"query": {
"multi_match": {
"query": "Bates Street",
"type": "most_fields",
"fields": ["lastname", "address"]
}
},
"profile": "true"
}
等价于:
GET bank/_search
{
"query": {
"bool": {
"should":[
{"match":{"lastname":"Bates Street"}},
{"match":{"address":"Bates Street"}}
]
}
},
"profile": "true"
}
这里就不写返回结果了,结果和best_fields都一样,没啥意思,达不到理论的效果,也许是数据问题,如果看
详解,点击下面传送门
当结构化的文档中,多个字段应该匹配的时候,此参数特别有用,例如,当通过firstname和lastname查询Nanette Bates的时候,最佳的匹配是Nanette 在一个字段,Bates在另一个字段。
一种处理这种查询的简单方式是将firstname字段和lastname索引到一个fullname字段。当然这只能在索引的时候完成。
creoos_fields在查询时通过采取term-centric方法来尝试解决这个问题。首先将查询字符串分词为单独的索引词,然后在任意字段中查找索引词。
查询示例如下:
GET bank/_search
{
"query": {
"multi_match": {
"query": "Nanette Bates",
"type": "cross_fields",
"fields": ["firstname", "lastname"],
"operator": "and"
}
},
"profile": "true"
}
返回结果如下:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 13.008575,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "13",
"_score" : 13.008575,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "+(firstname:nanette | lastname:nanette) +(firstname:bates | lastname:bates)",
"time_in_nanos" : 191743,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 7352,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 3561,
"advance_count" : 1,
"score" : 2363,
"build_scorer_count" : 3,
"create_weight" : 86374,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 92093
},
"children" : [
{
"type" : "DisjunctionMaxQuery",
"description" : "(firstname:nanette | lastname:nanette)",
"time_in_nanos" : 111881,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 2,
"compute_max_score" : 2698,
"advance" : 633,
"advance_count" : 2,
"score" : 1486,
"build_scorer_count" : 4,
"create_weight" : 58990,
"shallow_advance" : 797,
"create_weight_count" : 1,
"build_scorer" : 47277
},
"children" : [
{
"type" : "TermQuery",
"description" : "firstname:nanette",
"time_in_nanos" : 25371,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 2,
"compute_max_score" : 2536,
"advance" : 506,
"advance_count" : 2,
"score" : 1419,
"build_scorer_count" : 3,
"create_weight" : 12138,
"shallow_advance" : 572,
"create_weight_count" : 1,
"build_scorer" : 8200
}
},
{
"type" : "TermQuery",
"description" : "lastname:nanette",
"time_in_nanos" : 1964,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 1813,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 151
}
}
]
},
{
"type" : "DisjunctionMaxQuery",
"description" : "(firstname:bates | lastname:bates)",
"time_in_nanos" : 30512,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 2,
"compute_max_score" : 432,
"advance" : 152,
"advance_count" : 1,
"score" : 156,
"build_scorer_count" : 3,
"create_weight" : 12599,
"shallow_advance" : 406,
"create_weight_count" : 1,
"build_scorer" : 16767
},
"children" : [
{
"type" : "TermQuery",
"description" : "firstname:bates",
"time_in_nanos" : 884,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 1,
"create_weight" : 756,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 128
}
},
{
"type" : "TermQuery",
"description" : "lastname:bates",
"time_in_nanos" : 4932,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 3,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 1,
"compute_max_score_count" : 2,
"compute_max_score" : 293,
"advance" : 97,
"advance_count" : 1,
"score" : 84,
"build_scorer_count" : 2,
"create_weight" : 1769,
"shallow_advance" : 178,
"create_weight_count" : 1,
"build_scorer" : 2511
}
}
]
}
]
}
],
"rewrite_time" : 459182,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 9741
}
]
}
],
"aggregations" : [ ]
}
]
}
}
短语和短语前缀类型和best_fields类型一样,只不过使用的是match_phrase查询或者match_phrase_prefix查
询而不是match查询
查询示例如下:
GET bank/_search
{
"query": {
"multi_match": {
"query": "789 Madison S",
"type": "phrase_prefix",
"fields": ["address", "lastname"]
}
},
"profile": "true"
}
返回结果
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 325.35672,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "13",
"_score" : 325.35672,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "DisjunctionMaxQuery",
"description" : """(address:"789 madison (schaefer schenck schenectady seton seaview sackett stuart story schermerhorn stockholm sumner sumpter sackman sedgwick stryker seigel sands saratoga street summit surf stillwell sunnyside schweikerts stewart stockton scott stratford school sheffield seeley square shale strickland seabring stuyvesant schroeders strong senator seagate strauss sandford sharon sapphire seba stone sullivan stoddard seacoast scholes)" | lastname:"789 madison (schmidt sanchez schneider santos serrano schroeder sherman sellers schultz shaffer shaw santana sims small sexton savage salazar salinas sheppard shepherd sharp simpson sanford snow sandoval singleton slater shepard scott santiago simon sloan saunders salas sharpe sears snider sampson short simmons smith skinner silva shields sargent shelton sanders shannon sawyer schwartz)")""",
"time_in_nanos" : 821485,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 1,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 52218,
"match" : 36513,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 40671,
"advance_count" : 1,
"score" : 16465,
"build_scorer_count" : 3,
"create_weight" : 535511,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 140107
},
"children" : [
{
"type" : "MultiPhraseQuery",
"description" : "address:\"789 madison (schaefer schenck schenectady seton seaview sackett stuart story schermerhorn stockholm sumner sumpter sackman sedgwick stryker seigel sands saratoga street summit surf stillwell sunnyside schweikerts stewart stockton scott stratford school sheffield seeley square shale strickland seabring stuyvesant schroeders strong senator seagate strauss sandford sharon sapphire seba stone sullivan stoddard seacoast scholes)\"",
"time_in_nanos" : 532724,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 1,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 52152,
"match" : 36439,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 40449,
"advance_count" : 1,
"score" : 16415,
"build_scorer_count" : 3,
"create_weight" : 270345,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 116924
}
},
{
"type" : "MultiPhraseQuery",
"description" : "lastname:\"789 madison (schmidt sanchez schneider santos serrano schroeder sherman sellers schultz shaffer shaw santana sims small sexton savage salazar salinas sheppard shepherd sharp simpson sanford snow sandoval singleton slater shepard scott santiago simon sloan saunders salas sharpe sears snider sampson short simmons smith skinner silva shields sargent shelton sanders shannon sawyer schwartz)\"",
"time_in_nanos" : 255124,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 252146,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 2978
}
}
]
}
],
"rewrite_time" : 329671,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 22719
}
]
}
],
"aggregations" : [ ]
}
]
}
}
官方文档好像并没有这个用法的叙述,而且我所用的es7.9貌似也不支持此种用法,所以如果有想要了解的就直接
通过传送门啊
Intervals Query即间隔查询,根据匹配项的顺序和邻近程度返回文档。interval查询使用从一组小定义构建的
匹配规则。然后将这些规则应用于来自指定字段的术语。
这些定义产生最小间隔的序列,这些序列跨越文本正文中的术语。这些间隔可以由父源进一步组合和过滤。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。