大模型微调报错二_you can deactivate exllama backend by setting `dis

作者：小蓝xlanll | 2024-04-24 02:55:52

踩

you can deactivate exllama backend by setting `disable_exllama=true` in the

训练大模型Qwen15-05B-Chat-GPTQ-Int4
训练使用qwen1.5 sft：
命令：python finetune.py --model_name_or_path /llm/Qwen15-05B-Chat-GPTQ-Int4
–output_dir ./checkpoints
–model_max_length 512
–data_path /data/agi/dataset/train_0.5M_CN/output600.jsonl
–use_lora True
–per_device_train_batch_size 1
–q_lora True
–learning_rate 5e-4
运行报错：
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object
处理：
1）修改finetune.py。

    model = AutoModelForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        config=config,
        cache_dir=training_args.cache_dir,
        device_map=device_map,
        quantization_config=GPTQConfig(
            bits=4,
	    disable_exllama=True)  # 添加修改内容，放弃使用exllama
1
2
3
4
5
6
7
8

如果仍然不能正确运行，可以修改大模型目录文件config.json内容:

 "quantization_config": {
    ……
      "use_exllama": false
   }
1
2
3
4

之后应该可以正常运行。
我这遇到另一个报错：
RuntimeError: “addmm_impl_cpu_” not implemented for ‘Half’。
这个问题处理方法：
修改finetune.py：
model = get_peft_model(model, lora_config)
# 使用python，在cpu训练。使用deepspeed把下面model.float()注释掉
model.float()
这样使用cpu进行训练，训练时间相比gpu大大加大了训练时间。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/小蓝xlanll/article/detail/477277