赞
踩
最近用peft的lora对bloomz做训练,参考的https://github.com/linhduongtuan/BLOOM-LORA,训练了1轮后发现,结果没有任何变化,用几个检查点做了测试结果也是没有任何变化。
我的神经病一触即发
能看到检查点中adapter_model.bin只有4kb,这明显什么都没有保存。
- 4.0K ./bloomz7b1-patent-full/checkpoint-2600/adapter_config.json
- 4.0K ./bloomz7b1-patent-full/checkpoint-2600/adapter_model.bin
- 31M ./bloomz7b1-patent-full/checkpoint-2600/optimizer.pt
- 16K ./bloomz7b1-patent-full/checkpoint-2600/rng_state.pth
- 4.0K ./bloomz7b1-patent-full/checkpoint-2600/scheduler.pt
- 20K ./bloomz7b1-patent-full/checkpoint-2600/trainer_state.json
- 4.0K ./bloomz7b1-patent-full/checkpoint-2600/training_args.bin
狂找原因
有两篇帖子能作为参考
https://github.com/huggingface/peft/issues/503和model.save_pretrained() produced a corrupted adapter_model.bin (only 443 B) with alpaca-lora · Issue #286 · huggingface/peft · GitHub
都是解决保存和加载不一致的问题
但最终,我的解决办法是把这几行注释就可以了
- # old_state_dict = model.state_dict
- # model.state_dict = (
- # lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
- # ).__get__(model, type(model))
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。