赞
踩
对象和 Nested 对象的局限性
ES 提供了类似关系型数据库中 Join 的实现。使用 Join 数据类型实现,可以通过 Parent / Child 的关系,从而分离两个对象
定义父子关系的几个步骤
DELETE my_blogs # 设定 Parent/Child Mapping PUT my_blogs { "settings": { "number_of_shards": 2 }, "mappings": { "properties": { "blog_comments_relation": { "type": "join", "relations": { "blog": "comment" } }, "content": { "type": "text" }, "title": { "type": "keyword" } } } }
#索引父文档 PUT my_blogs/_doc/blog1 { "title":"Learning Elasticsearch", "content":"learning ELK @ geektime", "blog_comments_relation":{ "name":"blog" } } #索引父文档 PUT my_blogs/_doc/blog2 { "title":"Learning Hadoop", "content":"learning Hadoop", "blog_comments_relation":{ "name":"blog" } }
父文档和子文档必须存在相同的分片上
当指定文档时候,必须指定它的父文档 ID
#索引子文档 PUT my_blogs/_doc/comment1?routing=blog1 { "comment":"I am learning ELK", "username":"Jack", "blog_comments_relation":{ "name":"comment", "parent":"blog1" } } #索引子文档 PUT my_blogs/_doc/comment2?routing=blog2 { "comment":"I like Hadoop!!!!!", "username":"Jack", "blog_comments_relation":{ "name":"comment", "parent":"blog2" } } #索引子文档 PUT my_blogs/_doc/comment3?routing=blog2 { "comment":"Hello Hadoop", "username":"Bob", "blog_comments_relation":{ "name":"comment", "parent":"blog2" } }
# 查询所有文档 POST my_blogs/_search { } 返回输出: ... #根据父文档ID查看 GET my_blogs/_doc/blog2 返回输出: { "_index" : "my_blogs", "_type" : "_doc", "_id" : "blog2", "_version" : 1, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "title" : "Learning Hadoop", "content" : "learning Hadoop", "blog_comments_relation" : { "name" : "blog" } } } # Parent Id 查询 POST my_blogs/_search { "query": { "parent_id": { "type": "comment", "id": "blog2" } } } 返回输出: { "took" : 0, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.44183275, "hits" : [ { "_index" : "my_blogs", "_type" : "_doc", "_id" : "comment2", "_score" : 0.44183275, "_routing" : "blog2", "_source" : { "comment" : "I like Hadoop!!!!!", "username" : "Jack", "blog_comments_relation" : { "name" : "comment", "parent" : "blog2" } } }, { "_index" : "my_blogs", "_type" : "_doc", "_id" : "comment3", "_score" : 0.44183275, "_routing" : "blog2", "_source" : { "comment" : "Hello Hadoop", "username" : "Bob", "blog_comments_relation" : { "name" : "comment", "parent" : "blog2" } } } ] } } # Has Child 查询,返回父文档 POST my_blogs/_search { "query": { "has_child": { "type": "comment", "query" : { "match": { "username" : "Jack" } } } } } 返回输出: { "took" : 43, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "my_blogs", "_type" : "_doc", "_id" : "blog1", "_score" : 1.0, "_source" : { "title" : "Learning Elasticsearch", "content" : "learning ELK @ geektime", "blog_comments_relation" : { "name" : "blog" } } }, { "_index" : "my_blogs", "_type" : "_doc", "_id" : "blog2", "_score" : 1.0, "_source" : { "title" : "Learning Hadoop", "content" : "learning Hadoop", "blog_comments_relation" : { "name" : "blog" } } } ] } } # Has Parent 查询,返回相关的子文档 POST my_blogs/_search { "query": { "has_parent": { "parent_type": "blog", "query" : { "match": { "title" : "Learning Hadoop" } } } } } 返回输出: { "took" : 4, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "my_blogs", "_type" : "_doc", "_id" : "comment2", "_score" : 1.0, "_routing" : "blog2", "_source" : { "comment" : "I like Hadoop!!!!!", "username" : "Jack", "blog_comments_relation" : { "name" : "comment", "parent" : "blog2" } } }, { "_index" : "my_blogs", "_type" : "_doc", "_id" : "comment3", "_score" : 1.0, "_routing" : "blog2", "_source" : { "comment" : "Hello Hadoop", "username" : "Bob", "blog_comments_relation" : { "name" : "comment", "parent" : "blog2" } } } ] } } #通过ID ,访问子文档 GET my_blogs/_doc/comment3 返回输出: { "_index" : "my_blogs", "_type" : "_doc", "_id" : "comment3", "found" : false } #通过ID和routing ,访问子文档 GET my_blogs/_doc/comment3?routing=blog2 返回输出: { "_index" : "my_blogs", "_type" : "_doc", "_id" : "comment3", "_version" : 1, "_seq_no" : 5, "_primary_term" : 1, "_routing" : "blog2", "found" : true, "_source" : { "comment" : "Hello Hadoop", "username" : "Bob", "blog_comments_relation" : { "name" : "comment", "parent" : "blog2" } } } #更新子文档 PUT my_blogs/_doc/comment3?routing=blog2 { "comment": "Hello Hadoop??", "blog_comments_relation": { "name": "comment", "parent": "blog2" } } 返回输出: { "_index" : "my_blogs", "_type" : "_doc", "_id" : "comment3", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 6, "_primary_term" : 1 }
Join类型的Mapping如下:
核心
• 1) "my_join_field"为join的名称。
• 2)“question”: “answer” 指:qustion为answer的父类。
PUT join_ext_index { "mappings": { "properties": { "my_join_field": { "type": "join", "relations": { "question": [ "answer", "comment" ] } } } } }
文档类型为父类型:“question”。
PUT my_join_index/_doc/1?refresh
{
"text": "This is a question",
"my_join_field": "question"
}
PUT my_join_index/_doc/2?refresh
{
"text": "This is another question",
"my_join_field": "question"
}
• 路由值是强制性的,因为父文件和子文件必须在相同的分片上建立索引。
• "answer"是此子文档的加入名称。
• 指定此子文档的父文档ID:1。
PUT my_join_index/_doc/3?routing=1&refresh { "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } } PUT my_join_index/_doc/4?routing=1&refresh { "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } }
GET my_join_index/_search
{
"query": {
"match_all": {}
},
"sort": ["_id"]
}
返回结果如下:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 4, "max_score": null, "hits": [ { "_index": "my_join_index", "_type": "_doc", "_id": "1", "_score": null, "_source": { "text": "This is a question", "my_join_field": "question" }, "sort": [ "1" ] }, { "_index": "my_join_index", "_type": "_doc", "_id": "2", "_score": null, "_source": { "text": "This is another question", "my_join_field": "question" }, "sort": [ "2" ] }, { "_index": "my_join_index", "_type": "_doc", "_id": "3", "_score": null, "_routing": "1", "_source": { "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } }, "sort": [ "3" ] }, { "_index": "my_join_index", "_type": "_doc", "_id": "4", "_score": null, "_routing": "1", "_source": { "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } }, "sort": [ "4" ] } ] } }
GET my_join_index/_search
{
"query": {
"has_parent" : {
"parent_type" : "question",
"query" : {
"match" : {
"text" : "This is"
}
}
}
}
}
返回结果:
{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_join_index", "_type": "_doc", "_id": "3", "_score": 1, "_routing": "1", "_source": { "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } } }, { "_index": "my_join_index", "_type": "_doc", "_id": "4", "_score": 1, "_routing": "1", "_source": { "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } } } ] } }
GET my_join_index/_search
{
"query": {
"has_child" : {
"type" : "answer",
"query" : {
"match" : {
"text" : "This is question"
}
}
}
}
}
返回结果:
{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "my_join_index", "_type": "_doc", "_id": "1", "_score": 1, "_source": { "text": "This is a question", "my_join_field": "question" } } ] } }
以下操作含义如下:
• 1)parent_id是特定的检索方式,用于检索属于特定父文档id=1的,子文档类型为answer的文档的个数。
• 2)基于父文档类型question进行聚合;
• 3)基于指定的field处理。
GET my_join_index/_search { "query": { "parent_id": { "type": "answer", "id": "1" } }, "aggs": { "parents": { "terms": { "field": "my_join_field#question", "size": 10 } } }, "script_fields": { "parent": { "script": { "source": "doc['my_join_field#question']" } } } }
返回结果:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.13353139, "hits": [ { "_index": "my_join_index", "_type": "_doc", "_id": "3", "_score": 0.13353139, "_routing": "1", "fields": { "parent": [ "1" ] } }, { "_index": "my_join_index", "_type": "_doc", "_id": "4", "_score": 0.13353139, "_routing": "1", "fields": { "parent": [ "1" ] } } ] }, "aggregations": { "parents": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "1", "doc_count": 2 } ] } } }
如下,一个父文档question与多个子文档answer,comment的映射定义。
PUT join_ext_index
{
"mappings": {
"_doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"]
}
}
}
}
}
}
实现如下图的祖孙三代关联关系的定义。
question
/ \
/ \
comment answer
|
|
vote
PUT join_multi_index { "mappings": { "properties": { "my_join_field": { "type": "join", "relations": { "question": [ "answer", "comment" ], "answer": "vote" } } } } }
孙子文档导入数据,如下所示:
PUT join_multi_index/_doc/3?routing=1&refresh
{
"text": "This is a vote",
"my_join_field": {
"name": "vote",
"parent": "2"
}
}
注意:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。