当前位置:   article > 正文

nlp文本常见预处理方法_nlp的预处理常用技巧

nlp的预处理常用技巧

1. 去除标点

  1. def removeBianDian(self,word):
  2. if isinstance(word,str):
  3. word = word.decode("utf8")
  4. string = re.sub("[\.\!\/_,$%^*(+\"\']+|[+——!,。??、~@·#¥%……&*(:)\)-]+".decode("utf8"), "".decode("utf8"),word)
  5. return string

2. 圆角转半角

  1. def strQ2B(self,ustring):
  2. """全角转半角"""
  3. if isinstance(ustring,str):
  4. ustring = ustring.decode("utf8")
  5. rstring = ""
  6. for uchar in ustring:
  7. inside_code=ord(uchar)
  8. if inside_code == 12288:
  9. inside_code = 32
  10. elif (inside_code >= 65281 and inside_code <= 65374):
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/473977
推荐阅读
相关标签
  

闽ICP备14008679号