>_判断文本语言">
赞
踩
github源码:https://github.com/saffsd/langid.py
>>> import langid
# classify 输出最可能的语言
>>> langid.classify("I do not speak english")
('en', 0.57133487679900674)
>>> langid.set_languages(['de','fr','it'])
>>> langid.classify("I do not speak english")
('it', 0.99999835791478453)
>>> langid.set_languages(['en','it'])
>>> langid.classify("I do not speak english")
('en', 0.99176190378750373)
# rank 输出最可能的几种语言
>>> langid.rank("I do not speak english")
[('en', -49.99176190378750373), ('pl', -48.99176190378750373), ...)
TODO
Java包language-detection (03/03/2014版) 的Python接口
Github源码 https://github.com/Mimino666/langdetect
langid
# detect 输出最可能的语种
>>> from langdetect import detect
>>> detect("War doesn't show who's right, just who's left.")
'en'
>>> detect("Ein, zwei, drei, vier")
'de'
# detect_langs 输出最可能的几种语言
>>> from langdetect import detect_langs
>>> detect_langs("Otec matka syn.")
[sk:0.572770823327, pl:0.292872522702, cs:0.134356653968]
csdn看到的基于sklearn模型训练实现的语种检测, 未做测试
多语种检测
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。