当前位置:   article > 正文

python保存模型 drop_每次迭代后保存spacy的NER模型

spacy保存

每次迭代后,我都试图保存到Spacy自定义NER模型。我们是否有类似于tensorflow中的API,以在每次/特定的迭代次数后节省模型权重。然后我可以重新加载保存的模型并从那里继续训练。在

另外,如何在linux中使用系统上的所有内核。我发现四个核心中只有两个被使用。他们使用多任务CNN为NER,我知道这需要更多的时间来重新训练CPU。还有其他方法可以加快NER模型的训练。在@plac.annotations(

model=("Model name. Defaults to blank 'en' model.", "option", "m", str),

output_dir=("Optional output directory", "option", "o", Path),

n_iter=("Number of training iterations", "option", "n", int))

def main(model=None, output_dir=None, n_iter=100):

"""Load the model, set up the pipeline and train the entity recognizer."""

if model is not None:

nlp = spacy.load(model) # load existing spaCy model

print("Loaded model '%s'" % model)

else:

nlp = spacy.blank('en') # create blank Language class

print("Created blank 'en' model")

if 'ner' not in nlp.pipe_names:

ner = nlp.create_pipe('ner')

nlp.add_pipe(ner, last=True)

# otherwise, get it so we can add labels

else:

ner = nlp.get_pipe('ner')

# add labels

for _, annotations in TRAIN_DATA:

for ent in annotations.get('entities'):

ner.add_label(ent[2])

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']

with nlp.disable_pipes(*other_pipes): # only train NER

optimizer = nlp.begin_training()

for itn in range(n_iter):

random.shuffle(TRAIN_DATA)

losses = {}

for text, annotations in TRAIN_DATA:

nlp.update(

[text], # batch of texts

[annotations], # batch of annotations

drop=0.5, # dropout - make it harder to memorise data

sgd=optimizer, # callable to update weights

losses=losses)

print(losses)

# save model to output directory

if output_dir is not None:

output_dir = Path(output_dir)

if not output_dir.exists():

output_dir.mkdir()

nlp.to_disk(output_dir)

print("Saved model to", output_dir)

if __name__ == '__main__':

plac.call(main)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/923838
推荐阅读
相关标签
  

闽ICP备14008679号