TfidfVectorizer统计词频

作者：我家自动化 | 2024-04-01 21:02:10

踩

TfidfVectorizer统计词频

from sklearn.feature_extraction.text import TfidfVectorizer
import jieba

# text = ['This is the first document.', 'This is the second second document.', 'And the third one.',
#         'Is this the first document?', ]
# 
# tf = TfidfVectorizer(min_df=1)
#
# X = tf.fit_transform(text)
# names = tf.get_feature_names()
# print(names)
# print(X.toarray())


text = '今天天气真好,我要去北京天安门玩，要去景山攻牙之后，玩完大明劫'
# 进行结巴分词，精确模式
text_list = jieba.cut(text, cut_all=False)
text_list = ",".join(text_list)
context = []
context.append(text_list)
print(context)

tf = TfidfVectorizer(min_df=1)

X = tf.fit_transform(context)
names = tf.get_feature_names()

print(names)
print(X.toarray())

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/我家自动化/article/detail/350088?site