Demo1
TfidfTransformer + CountVectorizer = TfidfVectorizer
- from sklearn.feature_extraction.text import TfidfVectorizer, TfidfTransformer
-
- corpus = [
- 'This This is the first document.',
- 'This This is the second second document.',
- 'And the third one.',
- 'Is this the first document?',
- ]
-
- tfidf_model = TfidfVectorizer()
-
- tfidf_matrix = tfidf_model.fit_transform(corpus)
-
- word_dict = tfidf_model.get_feature_names()
-
- print(word_dict)
-
- print(tfidf_matrix)
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
(0, 1) 0.3493402123185688
(0, 2) 0.431504661587479
(0, 6) 0.2856085141790751
(0, 3) 0.3493402123185688
(0, 8) 0.6986804246371376
(1, 5) 0.7717016211057586
(1, 1) 0.24628357422338598
(1, 6) 0.20135295972313796
(1, 3) 0.24628357422338598
(1, 8) 0.49256714844677196
(2, 4) 0.5528053199908667
(2, 7) 0.5528053199908667
(2, 0) 0.5528053199908667
(2, 6) 0.2884767487500274
(3, 1) 0.4387767428592343
(3, 2) 0.5419765697264572
(3, 6) 0.35872873824808993
(3, 3) 0.4387767428592343
(3, 8) 0.4387767428592343
参数设置
关于参数:
-
input:string{'filename', 'file', 'content'}