当前位置:   article > 正文

nltk中文分句_利用NLTK进行分句分词

nltk中文分句

1.输入一个段落,分成句子(Punkt句子分割器)

import nltk

import nltk.data

def splitSentence(paragraph):

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

sentences = tokenizer.tokenize(paragraph)

return sentences

if __name__ == '__main__':

print splitSentence("My name is Tom. I am a boy. I like soccer!")

结果为['My name is Tom.', 'I am a boy.', 'I like soccer!']

2.输入一个句子,分成词组

from nltk.tokenize import WordPunctTokenizer

def wordtokenizer(sentence):

#分段

words = WordPunctTokenizer().tokenize(sentence)

return words

if __name__ == '__main__':

print wordtokenizer("My name is Tom.")结果为['My', 'name', 'is', 'Tom', '.']

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/543027
推荐阅读
  

闽ICP备14008679号