当前位置:   article > 正文

transformers-Generation with LLMs_automodelforcausallm.from_pretrained device_map

automodelforcausallm.from_pretrained device_map

https://huggingface.co/docs/transformers/main/en/llm_tutorialicon-default.png?t=N7T8https://huggingface.co/docs/transformers/main/en/llm_tutorial停止条件是由模型决定的,模型应该能够学习何时输出一个序列结束(EOS)标记。如果不是这种情况,则在达到某个预定义的最大长度时停止生成。

  1. from transformers import AutoModelForCausalLM
  2. model = AutoModelForCausalLM.from_pretrained(
  3. "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
  4. )
  1. from transformers import AutoTokenizer
  2. tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
  3. model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to("cuda")
  1. generated_ids = model.generate(**model_inputs)
  2. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  3. 'A list of colors: red, blue, green, yellow, orange, purple, pink,'
  1. tokenizer.pad_token = tokenizer.eos_token # Most LLMs don't have a pad token by default
  2. model_inputs = tokenizer(
  3. ["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
  4. ).to("cuda")
  5. generated_ids = model.generate(**model_inputs)
  6. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
  7. ['A list of colors: red, blue, green, yellow, orange, purple, pink,',
  8. 'Portugal is a country in southwestern Europe, on the Iber']

生成策略有很多,

生成结果太短或太长

如果在GenerationConfig文件中未指定,则默认情况下generate返回最多20个标记。建议在generate调用中手动设置max_new_tokens来控制它可以返回的最大新标记数。请注意,LLM(更精确地说是仅解码器模型)还将输入提示作为输出的一部分返回。

  1. model_inputs = tokenizer(["A sequence of numbers: 1, 2"], return_tensors="pt").to("cuda")
  2. # By default, the output will contain up to 20 tokens
  3. generated_ids = model.generate(**model_inputs)
  4. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  5. 'A sequence of numbers: 1, 2, 3, 4, 5'
  6. # Setting `max_new_tokens` allows you to control the maximum length
  7. generated_ids = model.generate(**model_inputs, max_new_tokens=50)
  8. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  9. 'A sequence of numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,'

生成模式不正确

默认情况下,generate在每次迭代中选择最可能的标记(greedy decoding),除非在GenerationConfig文件中指定。

  1. # Set seed or reproducibility -- you don't need this unless you want full reproducibility
  2. from transformers import set_seed
  3. set_seed(42)
  4. model_inputs = tokenizer(["I am a cat."], return_tensors="pt").to("cuda")
  5. # LLM + greedy decoding = repetitive, boring output
  6. generated_ids = model.generate(**model_inputs)
  7. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  8. 'I am a cat. I am a cat. I am a cat. I am a cat'
  9. # With sampling, the output becomes more creative!
  10. generated_ids = model.generate(**model_inputs, do_sample=True)
  11. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  12. 'I am a cat. Specifically, I am an indoor-only cat. I'

边缘填充错误

LLM是仅解码器架构,这意味着它们会继续对输入提示进行迭代。如果您的输入长度不相同,那么它们需要被填充。由于LLM没有被训练以从填充标记继续生成,因此输入需要进行左填充。确保还记得将注意力掩码传递给generate函数!

  1. # The tokenizer initialized above has right-padding active by default: the 1st sequence,
  2. # which is shorter, has padding on the right side. Generation fails to capture the logic.
  3. model_inputs = tokenizer(
  4. ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
  5. ).to("cuda")
  6. generated_ids = model.generate(**model_inputs)
  7. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  8. '1, 2, 33333333333'
  9. # With left-padding, it works as expected!
  10. tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
  11. tokenizer.pad_token = tokenizer.eos_token # Most LLMs don't have a pad token by default
  12. model_inputs = tokenizer(
  13. ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
  14. ).to("cuda")
  15. generated_ids = model.generate(**model_inputs)
  16. tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
  17. '1, 2, 3, 4, 5, 6,'

错误的prompt

一些模型和任务需要特定的输入提示格式才能正常工作。如果未使用该格式,性能可能会出现悄然下降:模型可以运行,但效果不如按照预期的提示进行操作。

  1. tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")
  2. model = AutoModelForCausalLM.from_pretrained(
  3. "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", load_in_4bit=True
  4. )
  5. set_seed(0)
  6. prompt = """How many helicopters can a human eat in one sitting? Reply as a thug."""
  7. model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
  8. input_length = model_inputs.input_ids.shape[1]
  9. generated_ids = model.generate(**model_inputs, max_new_tokens=20)
  10. print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
  11. "I'm not a thug, but i can tell you that a human cannot eat"
  12. # Oh no, it did not follow our instruction to reply as a thug! Let's see what happens when we write
  13. # a better prompt and use the right template for this model (through `tokenizer.apply_chat_template`)
  14. set_seed(0)
  15. messages = [
  16. {
  17. "role": "system",
  18. "content": "You are a friendly chatbot who always responds in the style of a thug",
  19. },
  20. {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
  21. ]
  22. model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
  23. input_length = model_inputs.shape[1]
  24. generated_ids = model.generate(model_inputs, do_sample=True, max_new_tokens=20)
  25. print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
  26. 'None, you thug. How bout you try to focus on more useful questions?'
  27. # As we can see, it followed a proper thug style
    声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/322600
    推荐阅读
    相关标签