赞
踩
最近预训练了BertForMaskedLM模型,需要将训练好的模型用到分类任务BertForSequenceClassification中进行微调。
微调时发现在加载优化器状态的时候出现如下的问题:
optimizer.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "optimizer.pt")))
File "/var/lib/docker/lib/python3.8/site-packages/torch/optim/optimizer.py", line 145, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
问题在于optimizer的参数组和从文件optimizer.pt中加载的优化器状态包含的参数不一致。optimizer.pt是预训练模型保存的结果,而此时我的优化器已经
model_param_list = [p[0] for p in model.named_parameters()]
print(model_param_list)
# 打印内容中包括 'classifier.weight', 'classifier.bias' 这两个分类器需要的参数组
当前的模型是BertForSequenceClassification,参数与预训练模型不一致,而此时去加载BertForMaskedLM结果的optimizer.pt,必然会导致优化器不一致的问题。因此对优化器状态自定义,如下:
no_decay = ["bias", "LayerNorm.weight"]
optimizer_grouped_parameters = [
{"params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], "weight_decay": args.weight_decay},
{"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], "weight_decay": 0.0},
]
可以使用optimizer.state_dict()
查看此时的优化器状态
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。