当前位置:   article > 正文

python.nlp随笔(四)简单的全文检索系统_nlp实验 信息检索 代码

nlp实验 信息检索 代码

实现一个简单的电影评论语料库的全文检索系统

  1. #!/usr/bin/env python3
  2. # -*- coding: utf-8 -*-
  3. """
  4. Created on Wed Apr 4 15:28:11 2018
  5. @author: dag
  6. """
  7. #coding:utf-8
  8. import nltk
  9. import re
  10. def raw(file):
  11. contents = open(file).read()
  12. contents = re.sub(r'<.*?>', ' ', contents)
  13. contents = re.sub('\s+', ' ', contents)
  14. return contents
  15. def snippet(doc, term): # buggy
  16. text = ' '*30 + raw(doc) + ' '*30
  17. pos = text.index(term)
  18. return text[pos-30:pos+30]
  19. print ("Building Index...")
  20. files = nltk.corpus.movie_reviews.abspaths()
  21. idx = nltk.Index((w, f) for f in files for w in raw(f).split())
  22. query = ''
  23. while query != "quit":
  24. query = input("query> ")
  25. if query in idx:
  26. for doc in idx[query]:
  27. print (snippet(doc, query))
  28. else:
  29. print ("Not found")

output:

  1. runfile('/Users/dag/.spyder-py3/filesearch.py', wdir='/Users/dag/.spyder-py3')
  2. Building Index...
  3. query> natural
  4. drug usage as being a hip and natural part of the art scene
  5. twister , this is yet another natural disaster movie that do
  6. , such as bull durham and the natural . no curve balls here
  7. great abandon . rejecting her natural abilities , she has sp
  8. due to the exhausting of our natural resources . each membe
  9. rn kentucky . it's beauty and natural goodness is being slow
  10. ilming in order to get a more natural expression of fear out

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/码创造者/article/detail/843482
推荐阅读
相关标签
  

闽ICP备14008679号