赞
踩
美图欣赏:
一.NLTK环境配置
1.安装nltk包(如果开始能装 忽然爆红 多装几次)
pip install nltk
2.在python consol里面
//1.先导入包
import nltk
//2.下载基本的数据
nltk.download()
注:如果在线下载失败,可以自行官网下载 然后放到指定文件夹。
二.利用NLTK做英文分词
这里选用的是anaconda做解释器
1.实现段落分句
import nltk
# 获取一段文本
text = "In the coming new term, there will be many challenging exams. Firstly, in June, there is a College English Test Band Four. In May, Certificate of Accounting Professional is around the corner. Without sufficient preparations, I can hardly expect to pass those exams. So I have to plan more time to take enough preparation."
#(1)实现段落分句
#language是关键词(红色表示),默认是english
#sent_tokenize方法作用就是分句 (以“.”作为分割符)
tokenize = nltk.sent_tokenize(text, language='english')
print(tokenize)
打印结果:
['In the coming new term, there will be many challenging exams.', 'Firstly, in June, there is a College English Test Band Four.', 'In May, Certificate of Accounting Professional is around the corner.', 'Without sufficient preparations, I can hardly expect to pass those exams.', 'So I have to plan more time to take enough preparation.']
2.实现分词
#(2)实现分词
#append方法:循环出来的内容进行追加到数组words[]中
#word_tokenize方法作用进行分词
words = []
for word in tokenize:
words.append(nltk.word_tokenize(word))
print(words)
打印结果:
[['In', 'the', 'coming', 'new', 'term', ',', 'there', 'will', 'be', 'many', 'challenging', 'exams', '.'], ['Firstly', ',', 'in', 'June', ',', 'there', 'is', 'a', 'College', 'English', 'Test', 'Band', 'Four', '.'], ['In', 'May', ',', 'Certificate', 'of', 'Accounting', 'Professional', 'is', 'around', 'the', 'corner', '.'], ['Without', 'sufficient', 'preparations', ',', 'I', 'can', 'hardly', 'expect', 'to', 'pass', 'those', 'exams', '.'], ['So', 'I', 'have', 'to', 'plan', 'more', 'time', 'to', 'take', 'enough', 'preparation', '.']]
3.词性标注
#(3)词性标注
#pos_tag方法就是做词性解析
wordtagging = []
for cixing in words:
wordtagging.append(nltk.pos_tag(cixing))
print(wordtagging)
打印结果:
[[('In', 'IN'), ('the', 'DT'), ('coming', 'VBG'), ('new', 'JJ'), ('term', 'NN'), (',', ','), ('there', 'EX'), ('will', 'MD'), ('be', 'VB'), ('many', 'JJ'), ('challenging', 'VBG'), ('exams', 'NNS'), ('.', '.')], [('Firstly', 'RB'), (',', ','), ('in', 'IN'), ('June', 'NNP'), (',', ','), ('there', 'EX'), ('is', 'VBZ'), ('a', 'DT'), ('College', 'NNP'), ('English', 'NNP'), ('Test', 'NNP'), ('Band', 'NNP'), ('Four', 'NNP'), ('.', '.')], [('In', 'IN'), ('May', 'NNP'), (',', ','), ('Certificate', 'NNP'), ('of', 'IN'), ('Accounting', 'NNP'), ('Professional', 'NNP'), ('is', 'VBZ'), ('around', 'IN'), ('the', 'DT'), ('corner', 'NN'), ('.', '.')], [('Without', 'IN'), ('sufficient', 'JJ'), ('preparations', 'NNS'), (',', ','), ('I', 'PRP'), ('can', 'MD'), ('hardly', 'RB'), ('expect', 'VB'), ('to', 'TO'), ('pass', 'VB'), ('those', 'DT'), ('exams', 'NNS'), ('.', '.')], [('So', 'RB'), ('I', 'PRP'), ('have', 'VBP'), ('to', 'TO'), ('plan', 'VB'), ('more', 'JJR'), ('time', 'NN'), ('to', 'TO'), ('take', 'VB'), ('enough', 'JJ'), ('preparation', 'NN'), ('.', '.')]]
三.利用NLTK做中文分词
1.实现段落分句
import nltk
#解析中文
#做中文分词解析,分割符一定要用“.”才可以正确识别解析(“.”后面一定要一个空格)
text1 = '同是风华正茂,怎敢甘拜下风 . 保持学习,保持饥饿'
Juzi_chinese = nltk.sent_tokenize(text1)
print(Juzi_chinese)
结果:
['同是风华正茂,怎敢甘拜下风 .', '保持学习,保持饥饿']
2.实现分词
#分词解析的是文本,不是句子
#word_tokenize方法实现分词
tokens=nltk.word_tokenize(text1)
print(tokens)
打印结果:
['同是', '风华', '正茂,怎敢', '甘拜', '下风', '.', '保持', '学习,保持', '饥饿']
————保持饥饿,保持学习
Jackson_MVP
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。