当前位置:   article > 正文

初学者|手把手带你学TextBlob

text blob安装

本文介绍了TextBlob的使用方法,这是一个用Python编写的开源的文本处理库。它可以用来执行很多自然语言处理的任务,比如,词性标注,名词性成分提取,情感分析,文本翻译,等等。


简介

TextBlob是一个用Python编写的开源的文本处理库。 它可以用来执行很多自然语言处理的任务,比如,词性标注,名词性成分提取,情感分析,文本翻译,等等。
Github地址: https://github.com/sloria/TextBlob

官方文档:https://textblob.readthedocs.io/en/dev/


实战
1.安装
  1. # 安装:pip install textblob
  2. # 配置国内源安装:pip install textblob  -i https://pypi.tuna.tsinghua.edu.cn/simple
  3. # 参考:https://textblob.readthedocs.io/en/dev/quickstart.html
  4. from textblob import TextBlob
  5. text = 'I love natural language processing! I am not like fish!'
  6. blob = TextBlob(text)

2.词性标注

  1. blob.tags
  2. [('I', 'PRP'),
  3. ('love', 'VBP'),
  4. ('natural', 'JJ'),
  5. ('language', 'NN'),
  6. ('processing', 'NN'),
  7. ('I', 'PRP'),
  8. ('am', 'VBP'),
  9. ('not', 'RB'),
  10. ('like', 'IN'),
  11. ('fish', 'NN')]

3.短语抽取

  1. np = blob.noun_phrases
  2. for w in np:
  3.    print(w)
  4. natural language processing

4.计算句子情感值

  1. for sentence in blob.sentences:
  2.    print(sentence + '------>' +  str(sentence.sentiment.polarity))
  3. I love natural language processing!------>0.3125
  4. i am not like you!------>0.0

5.Tokenization(把文本切割成句子或者单词) 

  1. token = blob.words
  2. for w in token:
  3.    print(w)
  4. I
  5. love
  6. natural
  7. language
  8. processing
  9. I
  10. am
  11. not
  12. like
  13. fish
  14. sentence = blob.sentences
  15. for s in sentence:
  16.    print(s)
  17. I love natural language processing!
  18. I am not like fish!

6.词语变形(Words Inflection) 

  1. token = blob.words
  2. for w in token:
  3.    # 变复数
  4.    print(w.pluralize())
  5.    # 变单数
  6.    print(w.singularize())
  7. we
  8. I
  9. love
  10. love
  11. naturals
  12. natural
  13. languages
  14. language
  15. processings
  16. processing
  17. we
  18. I
  19. ams
  20. am
  21. nots
  22. not
  23. likes
  24. like
  25. fish
  26. fish

7.词干化(Words Lemmatization) 

  1. from textblob import Word
  2. w = Word('went')
  3. print(w.lemmatize('v'))
  4. w = Word('octopi')
  5. print(w.lemmatize())
  6. go
  7. octopus

8.集成WordNet 

  1. from textblob.wordnet import VERB
  2. word = Word('octopus')
  3. syn_word = word.synsets
  4. for syn in syn_word:
  5.    print(syn)
  6. Synset('octopus.n.01')
  7. Synset('octopus.n.02')
  8. # 指定返回的同义词集为动词
  9. syn_word1 = Word("hack").get_synsets(pos=VERB)
  10. for syn in syn_word1:
  11.    print(syn)
  12. Synset('chop.v.05')
  13. Synset('hack.v.02')
  14. Synset('hack.v.03')
  15. Synset('hack.v.04')
  16. Synset('hack.v.05')
  17. Synset('hack.v.06')
  18. Synset('hack.v.07')
  19. Synset('hack.v.08')
  20. # 查看synset(同义词集)的具体定义
  21. Word("beautiful").definitions
  22. ['delighting the senses or exciting intellectual or emotional admiration',
  23. '(of weather) highly enjoyable']

9.拼写纠正(Spelling Correction) 

  1. sen = 'I lvoe naturl language processing!'
  2. sen = TextBlob(sen)
  3. print(sen.correct())
  4. I love nature language processing!
  5. # Word.spellcheck()返回拼写建议以及置信度
  6. w1 = Word('good')
  7. w2 = Word('god')
  8. w3 = Word('gd')
  9. print(w1.spellcheck())
  10. print(w2.spellcheck())
  11. print(w3.spellcheck())
  12. [('good', 1.0)]
  13. [('god', 1.0)]
  14. [('go', 0.586139896373057), ('god', 0.23510362694300518), ('d', 0.11658031088082901), ('g', 0.03626943005181347), ('ed', 0.009067357512953367), ('rd', 0.006476683937823834), ('nd', 0.0038860103626943004), ('gr', 0.0025906735751295338), ('sd', 0.0006476683937823834), ('md', 0.0006476683937823834), ('id', 0.0006476683937823834), ('gdp', 0.0006476683937823834), ('ga', 0.0006476683937823834), ('ad', 0.0006476683937823834)]

10.句法分析(Parsing) 

  1. text = TextBlob('I lvoe naturl language processing!')
  2. print(text.parse())
  3. I/PRP/B-NP/O lvoe/NN/I-NP/O naturl/NN/I-NP/O language/NN/I-NP/O processing/NN/I-NP/O !/./O/O

11.N-Grams 

  1. text = TextBlob('I lvoe naturl language processing!')
  2. print(text.ngrams(n=2))
  3. [WordList(['I', 'lvoe']), WordList(['lvoe', 'naturl']), WordList(['naturl', 'language']), WordList(['language', 'processing'])]

12.TextBlob实战之朴素贝叶斯文本分类

  1. # 一个使用TextBlob进行Naive Bayes classifier
  2. # 参考:https://textblob.readthedocs.io/en/dev/classifiers.html#classifiers
  3. # 1.准备数据集:训练集和测试集
  4. train = [
  5. ...    ('I love this sandwich.', 'pos'),
  6. ...    ('this is an amazing place!', 'pos'),
  7. ...    ('I feel very good about these beers.', 'pos'),
  8. ...    ('this is my best work.', 'pos'),
  9. ...    ("what an awesome view", 'pos'),
  10. ...    ('I do not like this restaurant', 'neg'),
  11. ...    ('I am tired of this stuff.', 'neg'),
  12. ...    ("I can't deal with this", 'neg'),
  13. ...    ('he is my sworn enemy!', 'neg'),
  14. ...    ('my boss is horrible.', 'neg')
  15. ... ]
  16. test = [
  17. ...    ('the beer was good.', 'pos'),
  18. ...    ('I do not enjoy my job', 'neg'),
  19. ...    ("I ain't feeling dandy today.", 'neg'),
  20. ...    ("I feel amazing!", 'pos'),
  21. ...    ('Gary is a friend of mine.', 'pos'),
  22. ...    ("I can't believe I'm doing this.", 'neg')
  23. ... ]
  24. # 2.创建朴素贝叶斯分类器
  25. from textblob.classifiers import NaiveBayesClassifier
  26. # 3.把训练丢进去训练
  27. nb_model = NaiveBayesClassifier(train)
  28. # 4.预测新来的样本
  29. dev_sen = "This is an amazing library!"
  30. print(nb_model.classify(dev_sen))
  31. pos
  32. # 也可以计算属于某一类的概率
  33. dev_sen_prob = nb_model.prob_classify(dev_sen)
  34. print(dev_sen_prob.prob("pos"))
  35. 0.980117820324005
  36. # 5.计算模型在测试集上的精确度
  37. print(nb_model.accuracy(test))
  38. 0.8333333333333334

代码已上传:
1.https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/TextBlobDemo.ipynb
2.https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/TextBlob2TextClassifier.ipynb

— 完 —

640?扫码关注人工智能头条 围观一个假的 AI 640?

640?wx_fmt=gif

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/482897
推荐阅读
相关标签
  

闽ICP备14008679号