4bit/8bit 启动 Mixtral 8*7B 大语言模型_load_in_8bit=true

作者：小蓝xlanll | 2024-04-07 22:45:25

踩

load_in_8bit=true

4bit/8bit 启动 Mixtral 8*7B 大语言模型

0. 背景
1. 修改代码

0. 背景

个人电脑配置实在难以以 float16 运行 Mixtral 8*7B 大语言模型，所以参数 4bit 或者 8bit 来启动。

实际测试结果，4bit 时推理速度明显变快了，8bit 时推理也非常慢。

使用的推理框架时 fastchat。

1. 修改代码

vi fastchat/model/model_adapter.py
1

修改前，

class MistralAdapter(BaseModelAdapter):
    """The model adapter for Mistral AI models"""

    def match(self, model_path: str):
        return "mistral" in model_path.lower() or "mixtral" in model_path.lower()

    def load_model(self, model_path: str, from_pretrained_kwargs: dict):
        model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)
        model.config.eos_token_id = tokenizer.eos_token_id
        model.config.pad_token_id = tokenizer.pad_token_id
        return model, tokenizer
1
2
3
4
5
6
7
8
9
10
11

修改后，

class MistralAdapter(BaseModelAdapter):
    """The model adapter for Mistral AI models"""

    def match(self, model_path: str):
        return "mistral" in model_path.lower() or "mixtral" in model_path.lower()

    def load_model(self, model_path: str, from_pretrained_kwargs: dict):
        # model, tokenizer = super().load_model(model_path, from_pretrained_kwargs)
        tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
        if "mixtral" in model_path.lower():
            model = AutoModelForCausalLM.from_pretrained(
                model_path,
                low_cpu_mem_usage=True,
                trust_remote_code=True,
                # attn_implementation="flash_attention_2",
                # load_in_8bit=True,
                load_in_4bit=True,
                **from_pretrained_kwargs,
            )
        else:
            model = AutoModelForCausalLM.from_pretrained(
                model_path,
                low_cpu_mem_usage=True,
                trust_remote_code=True,
                **from_pretrained_kwargs,
            )
        model.config.eos_token_id = tokenizer.eos_token_id
        model.config.pad_token_id = tokenizer.pad_token_id
        return model, tokenizer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

完结！

声明：本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：【wpsshop博客】