当前位置:   article > 正文

NLP常用工具包实战 (4)spacy工具包:文本处理、词性、命名体识别、案例一(找出所有人物名字)、案例二(恐怖袭击文本资料分析)_rand-terrorism-dataset.txt

rand-terrorism-dataset.txt

导入工具包和英文模型

# python -m spacy download en 用管理员身份打开CMD
  • 1
import spacy
nlp = spacy.load('en')
from spacy import displacy
from collections import Counter, defaultdict
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

1 文本处理

doc = nlp('Weather is good, very windy and sunny. We have no classes in the afternoon.')
# 分词
for token in doc:
    print(token)
# 分句
for sent in doc.sents:
    print(sent)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

2 词性

for token in doc:
    print('{}-{}'.format(token, token.pos_))
  • 1
  • 2

3 命名体识别

doc_2 = nlp("I went to Paris where I met my old friend Jack from uni.")
for ent in doc_2.ents:
    print('{}-{}'.format(ent, ent.label_))
doc3 =<
  • 1
  • 2
  • 3
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/555548
推荐阅读