当前位置:   article > 正文

[论文笔记] Pai-megatron Qwen1.5-14B-CT 后预训练 踩坑记录

[论文笔记] Pai-megatron Qwen1.5-14B-CT 后预训练 踩坑记录

1. 模型权重转换报错 hf2mcore_1.5_v2.py

报错为:

/mnt/cpfs/kexin/dlc_code/qwen1.5/PAI-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen/hf2mcore_1.5_v2.py

正确文件替换如下,更改了477行,删除了 args.hidden_size 这个维度,在tp>1时也支持转换:

  1. elif 'linear_qkv.bias' in k and 'norm' not in k:
  2. # raw
  3. viewed = v.view(args.num_query_groups, -1, head_dim, args.hidden_size)
  4. # changed
  5. viewed = v.view(args.num_query_groups, -1, head_dim)

替换为:

  1. import os
  2. import re
  3. import json
  4. import torch
  5. import transformers
  6. import torch.nn as nn
  7. from functools import partial
  8. from collections import defaultdict
  9. from transformers import (
  10. AutoConfig,
  11. AutoModelForCausalLM,
  12. AutoTokenizer,
  13. )
  14. from transformers.models.mixtral
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/463691
推荐阅读
相关标签
  

闽ICP备14008679号