当前位置:   article > 正文

MongoDB 索引之全文索引_mongo全文索引

mongo全文索引
mongodb full text search(fts:全文搜素)是在版本2.4新加的特性。在以前的版本,是通过精确匹配和正则表达式来查询,这效率是很低的。全文索引,能够从大量的文本中搜索出所需的内容,内置多国语言和分词方法。不支持宇宙第一语言—中文。全文索引会导致mongodb写入性能下降,因为所有字符串都要拆分,存储到不同地方。 
全文索引是一种技术,并大量的使用。如搜索引擎,站内搜索等等。常使用到的全文搜索有: 
lucene 
sphinx 
redis-search 
riak search 
不过,对中文的搜索不尽如人意。 
mongodb用的是开源的snowball分词器。参见 http://snowball.tartarus.org/texts/stemmersoverview.html
mongodb 全文索引支持的语言有: 
danish 
dutch 
english 
finnish 
french 
german 
hungarian 
italian 
norwegian 
portuguese 
romanian 
russian 
spanish 
swedish 
turkish 
如果希望使用其他语言,需要在创建索引时指定要使用的语言。默认是支持英文的。 
>db.collection.ensureIndex({content:"text"},{default_language:"spanish"}) 
一、开启全文索引 
默认是关闭的。否则会报错"err" : "text search not enabled" 
可以在脚本中声明启用: 
> db.adminCommand({setParameter:1,textSearchEnabled:true}) 
{ "was" : true, "ok" : 1 } 
也修改配置启用:mongod --setParameter textSearchEnabled=true 
二、插入测试数据 
下面是插入一些《加州旅馆》这首二十世纪非常著名的流行音乐。 
> db.ttlsa_com.insert({"song":"1. Hotel California", "lyrics": "On a dark desert highway, cool wind in my hair. Warm smell of colitas, rising up through the air."}) 
> db.ttlsa_com.insert({"song":"2. Hotel California", "lyrics": "Up ahead in the distance, I saw a shimmering light. My head grew heavy and my sight grew dim."}) 
> db.ttlsa_com.insert({"song":"3. Hotel California", "lyrics": "Such a lovely place, Such a lovely face."}) 
> db.ttlsa_com.insert({"song":"4. Hotel California", "lyrics": "Some dance to remember, some dance to forget."}) 
> db.ttlsa_com.insert({"song":"5. Hotel California", "lyrics": "Welcome to the Hotel California"}) 
> db.ttlsa_com.insert({"song":"hell world", "lyrics": "Welcome to beijing"}) 
> db.ttlsa_com.insert({"song":"加州旅馆", "lyrics": "Welcome to the Hotel California"}) 
三、创建全文索引 
> db.ttlsa_com.ensureIndex({"song":"text", "lyrics":"text"}) 
或者 
> db.ttlsa_com.ensureIndex({"$**": "text"}) 
$**表示在所有的字符串字段上创建一个全文索引。 
也可以指定权重 
> db.ttlsa_com.ensureIndex({"song":"text"},{"weights":{"song": 2, "$**": 3}}) 
四、查看索引信息 
>db.system.indexes.find().toArray() 
[{
"v": 1,
"key": {
"_id": 1
},
"name": "_id_",
"ns": "test.ttlsa_com"
},
{
"v": 1,
"key": {
"_fts": "text",
"_ftsx": 1
},
"name": "song__lyrics_",
"ns": "test.ttlsa_com",
"weights": {
"lyrics": 1,
"song": 1
},
"default_language": "english",
"language_override": "language",
"textIndexVersion": 2
}] 
五、查询 
全文索引是通过 
db.collection.runCommand( "text", { search: <string>, 
filter: <document>, 
project: <document>, 
limit: <number>, 
language: <string> } ) 
来查询的,非通过find()命令来实现的。 也可以通过find查询,如:db.ttlsa_com.find({$text:{$search:"aa bb"}});
1.搜索一个词,文本命令是不区分大小写的 
> db.ttlsa_com.runCommand("text",{search:"Welcome"}) 
{ 
"results" : [ 
{ 
"score" : 0.6666666666666666, 
"obj" : { 
"_id" : ObjectId("550a615dd8511e4dbcb8c77a"), 
"song" : "5. Hotel California", 
"lyrics" : "Welcome to the Hotel California" 
} 
}, 
{ 
"score" : 0.6666666666666666, 
"obj" : { 
"_id" : ObjectId("550a6163d8511e4dbcb8c77b"), 
"song" : "加州旅馆", 
"lyrics" : "Welcome to the Hotel California" 
} 
} 
], 
"stats" : { 
"nscanned" : 2, 
"nscannedObjects" : 2, 
"n" : 2, 
"timeMicros" : 1486, 
"shards" : { 
"shard_a" : { 
"nscanned" : NumberLong(2), 
"nscannedObjects" : NumberLong(2), 
"n" : 2, 
"timeMicros" : 917 
} 
} 
}, 
"ok" : 1 
} 
2.模糊搜索 
> db.ttlsa_com.runCommand("text",{search:"Such lovely"}) 
{ 
"results" : [ 
{ 
"score" : 1.125, 
"obj" : { 
"_id" : ObjectId("550a6154d8511e4dbcb8c778"), 
"song" : "3. Hotel California", 
"lyrics" : "Such a lovely place, Such a lovely face." 
} 
} 
], 
"stats" : { 
"nscanned" : 1, 
"nscannedObjects" : 1, 
"n" : 1, 
"timeMicros" : 618, 
"shards" : { 
"shard_a" : { 
"nscanned" : NumberLong(1), 
"nscannedObjects" : NumberLong(1), 
"n" : 1, 
"timeMicros" : 272 
} 
} 
}, 
"ok" : 1 
} 
3.匹配短语 
> db.ttlsa_com.runCommand("text",{search:"\"Some dance to remember\""}) 
{ 
"results" : [ 
{ 
"score" : 1.75, 
"obj" : { 
"_id" : ObjectId("550a6158d8511e4dbcb8c779"), 
"song" : "4. Hotel California", 
"lyrics" : "Some dance to remember, some dance to forget." 
} 
} 
], 
"stats" : { 
"nscanned" : 2, 
"nscannedObjects" : 1, 
"n" : 1, 
"timeMicros" : 743, 
"shards" : { 
"shard_a" : { 
"nscanned" : NumberLong(2), 
"nscannedObjects" : NumberLong(1), 
"n" : 1, 
"timeMicros" : 306 
} 
} 
}, 
"ok" : 1 
} 
4.匹配一些词但是不能包含指定的词 
> db.ttlsa_com.runCommand("text",{search:"Welcome -California"}) 
{ 
"results" : [ 
{ 
"score" : 0.75, 
"obj" : { 
"_id" : ObjectId("550a687fd8511e4dbcb8c781"), 
"song" : "hell world", 
"lyrics" : "Welcome to beijing" 
} 
} 
], 
"stats" : { 
"nscanned" : 4, 
"nscannedObjects" : 4, 
"n" : 1, 
"timeMicros" : 743, 
"shards" : { 
"shard_a" : { 
"nscanned" : NumberLong(4), 
"nscannedObjects" : NumberLong(4), 
"n" : 1, 
"timeMicros" : 345 
} 
} 
}, 
"ok" : 1 
} 
5.限制结果集条目 
>db.ttlsa_com.runCommand("text",{search:"Welcome",limit: 1}) 
6.在结果集中指定返回的字段 
>db.ttlsa_com.runCommand("text",{search:"Welcome",project:{"song":1,"_id":0}}) 
7.带额外查询条件的搜索 
>db.ttlsa_com.runCommand("text",{search:"Welcome",filter: {"song":"hell world"}}) 
8.使用特定语言搜索文本 
>db.ttlsa_com.runCommand("text",{search:"Welcome",language:"spanish"}) 
9.分词效果不是很理想。如 
>db.ttlsa_com.runCommand("text",{search:"the"}) 
{ 
"results" : [ ], 
"stats" : { 
"nscanned" : 0, 
"nscannedObjects" : 0, 
"n" : 0, 
"timeMicros" : 432, 
"shards" : { 
"shard_a" : { 
"nscanned" : NumberLong(0), 
"nscannedObjects" : NumberLong(0), 
"n" : 0, 
"timeMicros" : 151 
} 
} 
}, 
"ok" : 1 
} 
就搜索不到的。 
mongodb是不支持中文搜索的。 全文索引可以通过相似度,来查询与条件更匹配的文章,如下:
>db. ttlsa_com.find({$text:{$search:"aa bb"}},{score:{$meta:"textScore"}})
查询结果返回相似度分数。可以结合sort进行相似度排序。
>db. ttlsa_com.find({$text:{$search:"aa bb"}},{score:{$meta:"textScore"}}).sort({score:{$meta:"textScore"}});
要用到复杂的搜索功能,还是用sphinx。 
sphinx内容可以参见: http://www.ttlsa.com/?s=sphinx
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/一键难忘520/article/detail/883098
推荐阅读
相关标签
  

闽ICP备14008679号