赞
踩
采用kashgari模块(一个将Bert封装好的模块,用于快速搭建模型)的Bert模块快速搭建自己模型
在原来bert预训练模型基础上,导入自己的标注数据进一步训练,
bert中文预训练模型就自己下一下吧~
代码比较简单,就是导入数据,在预训练基础上进一步训练
- from kashgari.tasks.seq_labeling import BLSTMCRFModel
- from kashgari.embeddings import BERTEmbedding
- import kashgari
- from kashgari import utils
- import os
- #os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
- #os.environ["CUDA_VISIBLE_DEVICES"] = ""
-
- def get_sequence_tagging_data(file_path):
-
- data_x, data_y = [], []
-
- with open(file_path, 'r', encoding='utf-8') as f:
- lines = f.read().splitlines()
- # print(lines)
- x, y = [], []
- for line in lines:
- rows = line.split(' ')
- if len(rows) == 4:
- data_x.append(x)
- data_y.append(y)
- x = []
- y = []
- else:
- x.append(rows[0])
- y.append(rows[1])
- return data_x, data_y
- #train_x, train_y = get_sequence_tagging_data('training_data_bert_train.txt')
-
- # train_x, train_y = get_sequence_tagging_data('../new_note.txt')
- # print(f"train data count: {len(train_x)}")
-
-
-
- model training
- embedding = BERTEmbedding('/pvc/train/chinese_L-12_H-768_A-12',40)
- model = BLSTMCRFModel(embedding)
- model.fit(train_x,
- train_y,
- validation_split = 0.4,
- epochs=10,
- batch_size=32)
- print('model_save')
- model.save('../model_save/ner_model')
最后附上一段供测试的代码
- load_model = BLSTMCRFModel.load_model('../model_save/ner_model')
- print(load_model.predict("刘若英语怎么样"))
从训练数据的标注至NER模型训练流程就算简单走一遍了。
至此,NER本节就算完结了~逃
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。