赞
踩
ltp4.0 6月份放出来了,一个模型进行多任务学习,立马测试了一下效果,确实不错。
github链接:https://github.com/HIT-SCIR/ltp
1、首先下载docker,使用pytorch1.4版本,python版本3.7
https://hub.docker.com/r/pytorch/pytorch/tags
docker pull pytorch/pytorch:1.4-cuda10.1-cudnn7-devel
2、docker run 一个容器
nvidia-docker run -p 8889:8888 --name torch_py3 -it -v /data/:/data 76c152fbfd03
3、进入容器内,pip install ltp
Successfully built fire cytoolz termcolor toolz sacremoses
Installing collected packages: termcolor, fire, toolz, cytoolz, sentencepiece, torchtext, toml, regex, pyparsing, packaging, click, joblib, sacremoses, tokenizers, transformers, pytorch-ranger, torch-optimizer, ltp
Successfully installed click-7.1.2 cytoolz-0.10.1 fire-0.3.1 joblib-0.16.0 ltp-4.0.4 packaging-20.4 pyparsing-2.4.7 pytorch-ranger-0.1.1 regex-2020.7.14 sacremoses-0.0.43 sentencepiece-0.1.91 termcolor-1.1.0 tokenizers-0.8.1rc1 toml-0.10.1 toolz-0.10.0 torch-optimizer-0.0.1a14 torchtext-0.5.0 transformers-3.0.2
4、简单测试
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 20 11:44:27 2020
@author:
"""
from ltp import LTP
#ltp = LTP() # 默认加载 Small 模型
ltp = LTP(path = "base")
# ltp = LTP(path = "base|small|tiny")
# ltp = LTP(path = "tiny.tgz|tiny-tgz-extracted") # 其中 tiny-tgz-extracted 是 tiny.tgz 解压出来的文件夹
# sent_list = ltp.sent_split(inputs, flag="all", limit=510)
# ltp.init_dict(path="user_dict.txt", max_window=4)
# ltp.add_words(words=["负重前行", "长江大桥"], max_window=4)
segment, hidden = ltp.seg(["他叫汤姆去拿外衣。"])
pos = ltp.pos(hidden)
ner = ltp.ner(hidden)
srl = ltp.srl(hidden)
dep = ltp.dep(hidden)
sdp = ltp.sdp(hidden)
print(srl)
第一次会下载模型:
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 558M/558M [07:16<00:00, 1.28MB/s]
[[[], [('A0', 0, 0), ('A1', 2, 2), ('A2', 3, 5)], [], [], [('A0', 2, 2), ('A1', 5, 5)], [], []]]
方法二:使用源码
1、从github上拉去源码,
https://github.com/HIT-SCIR/ltp
2、然后在ltp目录下直接from ltp import LTP 使用,model的下载地址:
'base': 'http://39.96.43.154/ltp/v2/base.tgz',
'small': 'http://39.96.43.154/ltp/v2/small.tgz',
'tiny': 'http://39.96.43.154/ltp/v2/tiny.tgz'
3、进入docker内运行测试代码:
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 20 11:44:27 2020
@author: wangby21
"""
import os
import sys
sys.path.append(os.path.dirname(os.getcwd()))
from ltp import LTP
#ltp = LTP() # 默认加载 Small 模型
ltp = LTP(path = "ltp_learn/ltp/base_model/")
# ltp = LTP(path = "base|small|tiny")
# ltp = LTP(path = "tiny.tgz|tiny-tgz-extracted") # 其中 tiny-tgz-extracted 是 tiny.tgz 解压出来的文件夹
# sent_list = ltp.sent_split(inputs, flag="all", limit=510)
# ltp.init_dict(path="user_dict.txt", max_window=4)
ltp.add_words(words=["南京市长","长江大桥"], max_window=4)
segment, hidden = ltp.seg(["南京市长江大桥很长"])
print(segment)
pos = ltp.pos(hidden)
ner = ltp.ner(hidden)
srl = ltp.srl(hidden)
dep = ltp.dep(hidden)
sdp = ltp.sdp(hidden)
print(segment,srl)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。