当前位置:   article > 正文

NLP01-python的wordcloud实现中文词云小例_python wordcloud api

python wordcloud api

这里写图片描述

上图是下面歌词生成的

《When You Are Old》
William Butler Yeats
When you are old and grey and full of sleep,
And nodding by the fire, take down this book,
And slowly read, and dream of the soft look
Your eyes had once, and of their shadows deep;
How many loved your moments of glad grace,
And loved your beauty with love false or true,
But one man loved the pilgrim soul in you,
And loved the sorrows of your changing face;
And bending down beside the glowing bars,
Murmur, a little sadly, how love fled
And paced upon the mountains overhead
And hid his face amid a crowd of stars.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

摘要:只是wordcloud的安装与演示测试,可为入门者提供帮助。

1. 安装

构建词云的方法很多, 但是个人觉得python的wordcloud包功能最为强大,可以自定义图片.
官网: https://amueller.github.io/word_cloud/
github: https://github.com/amueller/word_cloud
安装:pip install wordcloud
或 下载:http://www.lfd.uci.edu/~gohlke/pythonlibs/#wordcloud 然后安装。

2. 查看API

API中,WordCloud类是重要类。

class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9,mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None,background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling=0.5, regexp=None, collocations=True,colormap=None, normalize_plurals=True)
font_path : string
    Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don’t have this font, you need to adjust this path.
    [对于win7,这个得修改了,否则会乱码]
width : int (default=400)
    Width of the canvas.
    画布宽
height : int (default=200)
    Height of the canvas.
    画布高
prefer_horizontal : float (default=0.90)
    The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit. (There is currently no built-in way to get only vertical words.)

mask : nd-array or None (default=None)

scale : float (default=1)
    Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.
min_font_size : int (default=4)
    Smallest font size to use. Will stop when there is no more room in this size.
    最小字号大小
font_step : int (default=1)
    Step size for the font. font_step > 1 might speed up computation but give a worse fit.
max_words : number (default=200)
    The maximum number of words.
    显示的最多中词数据上限
stopwords : set of strings or None
    The words that will be eliminated. If None, the build-in STOPWORDS list will be used.
    停用词
background_color : color value (default=”black”)
    Background color for the word cloud image.
    前景色
max_font_size : int or None (default=None)
    Maximum font size for the largest word. If None, height of the image is used.
    词的最大大小;
mode : string (default=”RGB”)
    Transparent background will be generated when mode is “RGBA” and background_color is None.
    relative_scaling : float (default=.5)
    Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good.
color_func : callable, default=None
    Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites “colormap”. See colormap for specifying a matplotlib colormap instead.
regexp : string or None (optional)
    Regular expression to split the input text into tokens in process_text. If None is specified,r"\w[\w']+" is used.
collocations : bool, default=True
    Whether to include collocations (bigrams) of two words.
colormap : string or matplotlib colormap, default=”viridis”
    Matplotlib colormap to randomly draw colors from for each word. Ignored if “color_func” is specified.
normalize_plurals : bool, default=True
    Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48

3.图片

图片名为:mask_png.png
这里写图片描述

4.测试中文文档

题目:脚抽筋怎么办
网址:http://health.china.com/html/jiankang/jijiuzhinan/richangjijiu/201603/26-328450.html

5.代码

# -*- coding: utf-8 -*-
from os import path

import jieba
import matplotlib.pyplot as plt
from scipy.misc import imread
from wordcloud import WordCloud


def doWordcloud():
    comment_text = open('test.txt', 'r', encoding='UTF-8').read()
    cut_text = " ".join(jieba.cut(comment_text))
    color_mask = imread("mask_png.png")
    cloud = WordCloud(
        # 设置字体,不指定就会出现乱码;
        # 在win7的路径:C:\Windows\Fonts进行查看
        font_path="simsun.ttc",
        mask=color_mask,
        max_words=200,
        max_font_size=80,
        width=1000,
        height=1000
    )
    word_cloud = cloud.generate(cut_text)  # 产生词云
    # word_cloud.to_file("pic.jpg")  # 保存图片
    plt.imshow(word_cloud)
    plt.axis('off')
    plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

说明:test.txt内容是《脚抽筋怎么办》的文章内容;
mask_png.png是上面那个小女孩的图片;

6.显示结果

这里写图片描述

【作者:happyprince ;http://blog.csdn.net/ld326/article/details/78341147

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/728959
推荐阅读
相关标签
  

闽ICP备14008679号