赞
踩
关系型数据库的范式化设计:范式化设计(Normalization)的主要目的是减少不必要的更新,但是一个完全范式化设计的数据会经常面临查询缓慢的问题(数据库越范式化,需要Join的表就越多)
反范式化设计(Denormalization):数据扁平,不使用关联关系,而是在文档中保存冗余的数据拷贝
关系型数据库一般会考虑Normalize数据,在Elasticsearch,往往考虑Denormalize数据(Denormalize的好处:读的速度快/无需表连接/无需行锁)
Elasticsearch并不擅长处理关联关系,一般采取以下四种方式处理
Nested Object | Parent/Child | |
---|---|---|
优点 | 文档存储在一起,读取性能高 | 父子文档可以独立更新 |
缺点 | 更新嵌套子文档时,需要更新整个文档 | 需要额外的内存维护关系,读取性能相对差 |
案例一:文章和作者的信息(1:1关系)
DELETE articles #设置articles的mappings信息 PUT /articles { "mappings": { "properties": { "content": { "type": "text" }, "time": { "type": "date" }, "author": { "properties": { "userid": { "type": "long" }, "username": { "type": "keyword" } } } } } } #插入一条测试数据 PUT articles/_doc/1 { "content":"Elasticsearch Helloworld!", "time":"2020-01-01T00:00:00", "author":{ "userid":1001, "username":"liu" } } #查询 POST articles/_search { "query": { "bool": { "must": [ {"match": { "content": "Elasticsearch" }}, {"match": { "author.username": "liu" }} ] } } }
案例二:文章和作者的信息(1:n关系)(有问题!)
DELETE articles #设置articles的mappings信息 PUT /articles { "mappings": { "properties": { "content": { "type": "text" }, "time": { "type": "date" }, "author": { "properties": { "userid": { "type": "long" }, "username": { "type": "keyword" } } } } } } POST articles/_search #插入一条测试数据 PUT articles/_doc/1 { "content":"Elasticsearch Helloworld!", "time":"2020-01-01T00:00:00", "author":[{ "userid":1001, "username":"liu" },{ "userid":1002, "username":"jia" }] } #查询(这样也能查到!为什么出现这种结果呢?) POST articles/_search { "query": { "bool": { "must": [ {"match": { "author.userid": "1001" }}, {"match": { "author.username": "jia" }} ] } } }
当使用对象保存有数组的文档时,我们发现会查询到不需要的结果,原因是什么呢?
存储时,内部对象的边界并没有考虑在内,JSON格式被处理成扁平式键值对的结构,当对多个字段进行查询时,导致了意外的搜索结果
"content":"Elasticsearch Helloworld!"
"time":"2020-01-01T00:00:00"
"author.userid":["1001","1002"]
"author.username":["liu","jia"]
使用嵌套对象(Nested Object)可以解决这个问题
允许对象数组中的对象被独立索引,使用Nested和properties关键字将所有author索引到多个分隔的文档 ,在内部,Nested文档会被保存在两个Lucene文档中,在查询时做Join处理
案例一:文章和作者的信息(1:n关系)
DELETE articles #设置articles的mappings信息 PUT /articles { "mappings": { "properties": { "content": { "type": "text" }, "time": { "type": "date" }, "author": { "type": "nested", "properties": { "userid": { "type": "long" }, "username": { "type": "keyword" } } } } } } POST articles/_search #插入一条测试数据 PUT articles/_doc/1 { "content":"Elasticsearch Helloworld!", "time":"2020-01-01T00:00:00", "author":[{ "userid":1001, "username":"liu" },{ "userid":1002, "username":"jia" }] } #查询(这样也能查到!为什么出现这种结果呢?) POST articles/_search { "query": { "bool": { "must": [ {"nested": { "path": "author", "query": { "bool": { "must": [ {"match": { "author.userid": "1001" }}, {"match": { "author.username": "jia" }} ] } } }} ] } } }
对象和Nested对象都存在一定的局限性,每次更新需要重新索引整个对象,Elasticsearch提供了类似关系型数据库中Join的实现,可以通过维护Parent/Child的关系,从而分离两个对象,父文档和子文档是两个独立的文档,更新父文档无需重新索引子文档,子文档被添加,更新或删除也不会影响到父文档和其他的子文档
案例:文章和作者的信息(1:n关系)
DELETE articles #设置articles的mappings信息 PUT /articles { "mappings": { "properties": { "article_author_relation": { "type": "join", "relations": { "article": "author" } }, "content": { "type": "text" }, "time": { "type": "date" } } } } #索引父文档 PUT articles/_doc/article1 { "article_author_relation":{ "name":"article" }, "content":"Elasticsearch Helloworld!", "time":"2020-01-01T00:00:00" } #索引子文档 PUT articles/_doc/author1?routing=article1 { "article_author_relation":{ "name":"author", "parent":"article1" }, "userid":"1001", "username":"jia" } PUT articles/_doc/author2?routing=article1 { "article_author_relation":{ "name":"author", "parent":"article1" }, "userid":"1002", "username":"liu" } GET articles/_doc/article1 POST articles/_search #根据parent_id父文档id查询子文档 POST articles/_search { "query": { "parent_id":{ "type":"author", "id":"article1" } } } #has_child返回父文档 POST articles/_search { "query": { "has_child":{ "type":"author", "query": { "match": { "username": "liu" } } } } } #has_parent返回子文档 POST articles/_search { "query": { "has_parent":{ "parent_type":"article", "query": { "match": { "content": "elasticsearch" } } } } }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。