赞
踩
首先,打开终端(Anaconda Prompt)安装nltk:
pip install nltk
打开Python终端或是Anaconda 的Spyder并输入以下内容来安装 NLTK 包
import nltk
nltk.download()
注意: 详细操作或其他安装方式请查看 Anaconda3安装jieba库和NLTK库。
由于英语的句子基本上就是由标点符号、空格和词构成,那么只要根据空格和标点符号将词语分割成数组即可,所以相对来说简单很多:
(1)分词:
from nltk import word_tokenize #以空格形式实现分词
paragraph = "The first time I heard that song was in Hawaii on radio. I was just a kid, and loved it very much! What a fantastic song!"
words = word_tokenize(paragraph)
print(words)
运行结果:
['The', 'first', 'time', 'I', 'heard', 'that', 'song', 'was', 'in', 'Hawaii', 'on', 'radio', '.', 'I', 'was', 'just', 'a', 'kid', ',', 'and', 'loved', 'it', 'very', 'much', '!', 'What', 'a', 'fantastic', 'song', '!']
(2)分句:
from nltk import sent_tokenize #以符号形式实现分句
sentences = "The first time I heard that song was in Hawaii on radio. I was just a kid, and loved it very much! What a fantastic song!"
sentence = sent_tokenize(sentences )
print(sentence)
运行结果:
['The first time I heard that song was in Hawaii on radio.', 'I was just a kid, and loved it very much!', 'What a fantastic song!']
注意: NLTK分词或者分句以后,都会自动形成列表的形式
from nltk import word_tokenize
paragraph = "The first time I heard that song was in Hawaii on radio. I was just a kid, and loved it very much! What a fantastic song!".lower()
cutwords1 = word_tokenize(paragraph) #分词
print('【NLTK分词结果:】')
print(cutwords1)
interpunctuations = [',', '.', ':', ';', '?', '(', ')', '[', ']', '&', '!', '*', '@', '#', '$', '%'] #定义标点符号列表
cutwords2 = [word for word in cutwords1 if word not in interpunctuations] #去除标点符号
print('\n【NLTK分词后去除符号结果:】')
print(cutwords2)
运行结果:
【NLTK分词结果:】
['the', 'first', 'time', 'i', 'heard', 'that', 'song', 'was', 'in', 'hawaii', 'on', 'radio', '.', 'i', 'was', 'just', 'a', 'kid', ',', 'and', 'loved', 'it', 'very', 'much', '!', 'what', 'a', 'fantastic', 'song', '!']
【NLTK分词后去除符号结果:】
['the', 'first', 'time', 'i', 'heard', 'that', 'song', 'was', 'in', 'hawaii', 'on', 'radio', 'i', 'was', 'just', 'a', 'kid', 'and', 'loved', 'it', 'very', 'much', 'what', 'a', 'fantastic', 'song']
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。