当前位置:   article > 正文

llama-2-7b-chat-hf 参数及size

llama-2-7b-chat-hf

学习记录

模型结构

  1. LlamaConfig {
  2. "architectures": [
  3. "LlamaForCausalLM"
  4. ],
  5. "attention_bias": false,
  6. "attention_dropout": 0.0,
  7. "bos_token_id": 1,
  8. "eos_token_id": 2,
  9. "hidden_act": "silu",
  10. "hidden_size": 4096,
  11. "initializer_range": 0.02,
  12. "intermediate_size": 11008,
  13. "max_position_embeddings": 2048,
  14. "model_type": "llama",
  15. "num_attention_heads": 32,
  16. "num_hidden_layers": 32,
  17. "num_key_value_heads": 32,
  18. "pretraining_tp": 1,
  19. "rms_norm_eps": 1e-06,
  20. "rope_scaling": null,
  21. "rope_theta": 10000.0,
  22. "tie_word_embeddings": false,
  23. "torch_dtype": "bfloat16",
  24. "transformers_version": "4.38.2",
  25. "use_cache": true,
  26. "vocab_size": 32000
  27. }

重要的:32层,32个attention heads,词表大小为 32000

各参数及size:

  1. model.embed_tokens.weight torch.Size([32000, 4096])
  2. model.layers.0.self_attn.q_proj.weight torch.Size([4096, 4096])
  3. model.layers.0.self_attn.k_proj.weight torch.Size([4096, 4096])
  4. model.layers.0.self_attn.v_proj.weight torch.Size([4096, 4096])
  5. model.layers.0.self_attn.o_proj.weight torch.Size([4096, 4096])
  6. model.layers.0.mlp.gate_proj.weight torch.Size([11008, 4096])
  7. model.layers.0.mlp.up_proj.weight torch.Size([11008, 4096])
  8. model.layers.0.mlp.down_proj.weight torch.Size([4096, 11008])
  9. model.layers.0.input_layernorm.weight torch.Size([4096])
  10. model.layers.0.post_attention_layernorm.weight torch.Size([4096])
  11. model.layers.31.self_attn.q_proj.weight torch.Size([4096, 4096])
  12. model.layers.31.self_attn.k_proj.weight torch.Size([4096, 4096])
  13. model.layers.31.self_attn.v_proj.weight torch.Size([4096, 4096])
  14. model.layers.31.self_attn.o_proj.weight torch.Size([4096, 4096])
  15. model.layers.31.mlp.gate_proj.weight torch.Size([11008, 4096])
  16. model.layers.31.mlp.up_proj.weight torch.Size([11008, 4096])
  17. model.layers.31.mlp.down_proj.weight torch.Size([4096, 11008])
  18. model.layers.31.input_layernorm.weight torch.Size([4096])
  19. model.layers.31.post_attention_layernorm.weight torch.Size([4096])
  20. model.norm.weight torch.Size([4096])
  21. lm_head.weight torch.Size([32000, 4096])

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/667814
推荐阅读
相关标签
  

闽ICP备14008679号