赞
踩
这属于“天真”阵营,但这里有一种方法,将集合作为思考的食粮:docs = [
""" Here's a sentence with dog and apple in it """,
""" Here's a sentence with dog and poodle in it """,
""" Here's a sentence with poodle and apple in it """,
""" Here's a dog with and apple and a poodle in it """,
""" Here's an apple with a dog to show that order is irrelevant """
]
query = ['dog', 'apple']
def get_similar(query, docs):
res = []
query_set = set(query)
for i in docs:
# if all n elements of query are in i, return i
if query_set & set(i.split(" ")) == query_set:
res.append(i)
return res
这将返回:
^{pr2}$
当然,时间复杂度并不是很高,但由于执行哈希/集操作的速度,它比使用列表要快得多。
第2部分是,Elasticsearch是一个很好的候选者,如果您愿意付出努力,并且您要处理大量的数据。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。