当前位置:   article > 正文

微调Paddle UIE模型实现命名实体抽取_目前基于uie实现实体和关系抽取开源demo

目前基于uie实现实体和关系抽取开源demo

一、创建虚拟环境

好习惯,首先创建单独的运行环境

  1. conda create -n uie python=3.10.9
  2. conda activate uie

二、安装paddle框架及paddlenlp

2.1 参考官方文档安装paddle

开始使用_飞桨-源于产业实践的开源深度学习平台

首先查看自己服务器cuda版本,如下我的版本时10.2

  1. (PyTorch-1.8) [ma-user work]$nvidia-smi
  2. Wed Apr 19 23:35:11 2023
  3. +-----------------------------------------------------------------------------+
  4. | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
  5. |-------------------------------+----------------------+----------------------+
  6. | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
  7. | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
  8. |===============================+======================+======================|
  9. | 0 Tesla P100-PCIE... Off | 00000000:00:0E.0 Off | 0 |
  10. | N/A 39C P0 28W / 250W | 0MiB / 16280MiB | 0% Default |
  11. +-------------------------------+----------------------+----------------------+
  12. +-----------------------------------------------------------------------------+
  13. | Processes: GPU Memory |
  14. | GPU PID Type Process name Usage |
  15. |=============================================================================|
  16. | No running processes found |
  17. +-----------------------------------------------------------------------------+

在Paddle官网直接复制命令即可。

2.2  安装paddlenlp 

pip install --upgrade paddlenlp 

2.2.1 问题一 ERROR: Failed building wheel for numpy Failed to build numpy

  1. -x86_64-3.10/numpy/core/src/multiarray/scalartypes.o -MMD -MF build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/scalartypes.o.d" failed with exit status 1
  2. [end of output]
  3. note: This error originates from a subprocess, and is likely not a problem with pip.
  4. ERROR: Failed building wheel for numpy
  5. Failed to build numpy
  6. ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects
  7. [end of output]
  8. note: This error originates from a subprocess, and is likely not a problem with pip.
  9. error: subprocess-exited-with-error
  10. × pip subprocess to install backend dependencies did not run successfully.
  11. │ exit code: 1
  12. ╰─> See above for output.
  13. note: This error originates from a subprocess, and is likely not a problem with pip

手工安装numpy包,再次执行nlp包安装,还是不行。 

pip install numpy

换另外一种方式成功

  1. python3 -m pip install --upgrade paddlenlp -i https://mirror.baidu.com/pypi/simple

三、下载PaddleNLP源码

$git clone https://github.com/PaddlePaddle/PaddleNLP.git

四、执行训练

4.1、对标注数据进行预处理

python ../PaddleNLP/model_zoo/uie/doccano.py --doccano_file ./data.json --task_type ext --save_dir ./ --splits 0.7 0.2 0.1 --schema_lang ch

4.2、模型精调

  1. $python ../PaddleNLP/model_zoo/uie/finetune.py
  2. --device gpu
  3. --logging_steps 10
  4. --save_steps 100
  5. --eval_steps 100
  6. --seed 42
  7. --model_name_or_path uie-base
  8. --output_dir $finetuned_model
  9. --train_path ./train.txt
  10. --dev_path ./dev.txt
  11. --max_seq_length 512
  12. --per_device_eval_batch_size 16
  13. --per_device_train_batch_size 16
  14. --num_train_epochs 20
  15. --learning_rate 1e-5
  16. --label_names "start_positions" "end_positions"
  17. --do_train
  18. --do_eval
  19. --do_export
  20. --export_model_dir $finetuned_model
  21. --overwrite_output_dir
  22. --disable_tqdm True
  23. --metric_for_best_model eval_f1
  24. --load_best_model_at_end True
  25. --save_total_limit 1

出现下图及训练成功 

 五、模型应用

  1. from pprint import pprint
  2. from paddlenlp import Taskflow
  3. schema = ['时间', '地区', '指标名']
  4. ie = Taskflow('information_extraction', schema=schema, task_path="./checkpoint/model_best")
  5. pprint(ie("我想查询2022年山东省主营业务收入数据"))

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/502507
推荐阅读
相关标签
  

闽ICP备14008679号