当前位置:   article > 正文

通过sklearn使用tf-idf提取英文关键词

python sklearn tf-idf 英文

Demo1

TfidfTransformer + CountVectorizer = TfidfVectorizer

  1. from sklearn.feature_extraction.text import TfidfVectorizer, TfidfTransformer
  2. corpus = [
  3. 'This This is the first document.',
  4. 'This This is the second second document.',
  5. 'And the third one.',
  6. 'Is this the first document?',
  7. ]
  8. tfidf_model = TfidfVectorizer()
  9. tfidf_matrix = tfidf_model.fit_transform(corpus)
  10. word_dict = tfidf_model.get_feature_names()
  11. print(word_dict)
  12. print(tfidf_matrix)

['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
(0, 1) 0.3493402123185688
(0, 2) 0.431504661587479
(0, 6) 0.2856085141790751
(0, 3) 0.3493402123185688
(0, 8) 0.6986804246371376
(1, 5) 0.7717016211057586
(1, 1) 0.24628357422338598
(1, 6) 0.20135295972313796
(1, 3) 0.24628357422338598
(1, 8) 0.49256714844677196
(2, 4) 0.5528053199908667
(2, 7) 0.5528053199908667
(2, 0) 0.5528053199908667
(2, 6) 0.2884767487500274
(3, 1) 0.4387767428592343
(3, 2) 0.5419765697264572
(3, 6) 0.35872873824808993
(3, 3) 0.4387767428592343
(3, 8) 0.4387767428592343

参数设置

关于参数:

  • input:string{'filename', 'file', 'content'}

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/369182
推荐阅读
相关标签
  

闽ICP备14008679号