赞
踩
使用Python 进行简单文本类数据分析,包括:
1. 分词
2. 生成语料库,tfidf加权
3. lda主题提取模型
4. 词向量化word2vec
参考:
http://zhuanlan.zhihu.com/textmining-experience/1963076
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import MySQLdb
import pandas as pd
import pandas.io.sql as sql
import jieba
import nltk
import jieba.posseg as pseg
from gensim import corpora, models, similarities
import re
# import logging
# logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s',level=logging.INGO)
# reload(sys)
# sys.setdefaultencoding('utf-8')
if name == 'main':
#用户词典导入
jieba.load_user
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。