赞
踩
在对有标注的文本进行分词时,要用到nltk库中的WordPunctTokenizer和WhitespaceTokenizer,例子如下:
import nltk
from nltk.tokenize import WordPunctTokenizer,WhitespaceTokenizer
txt = 'red foxes <emotion>scare</emotion> me.'
token = WordPunctTokenizer().tokenize(txt)
print(token)
token1
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。