llama-2-7b-chat-hf 参数及size_llama2 7b chat hf

作者：你好赵伟 | 2024-05-12 10:42:02

踩

llama2 7b chat hf

「学习记录」

模型结构：


LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 2048,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 32000
}

重要的：32层，32个attention heads，词表大小为 32000

各参数及size：


model.embed_tokens.weight torch.Size([32000, 4096])
 
model.layers.0.self_attn.q_proj.weight torch.Size([4096, 4096])
model.layers.0.self_attn.k_proj.weight torch.Size([4096, 4096])
model.layers.0.self_attn.v_proj.weight torch.Size([4096, 4096])
model.layers.0.self_attn.o_proj.weight torch.Size([4096, 4096])
model.layers.0.mlp.gate_proj.weight torch.Size([11008, 4096])
model.layers.0.mlp.up_proj.weight torch.Size([11008, 4096])
model.layers.0.mlp.down_proj.weight torch.Size([4096, 11008])
model.layers.0.input_layernorm.weight torch.Size([4096])
model.layers.0.post_attention_layernorm.weight torch.Size([4096])
…
model.layers.31.self_attn.q_proj.weight torch.Size([4096, 4096])
model.layers.31.self_attn.k_proj.weight torch.Size([4096, 4096])
model.layers.31.self_attn.v_proj.weight torch.Size([4096, 4096])
model.layers.31.self_attn.o_proj.weight torch.Size([4096, 4096])
model.layers.31.mlp.gate_proj.weight torch.Size([11008, 4096])
model.layers.31.mlp.up_proj.weight torch.Size([11008, 4096])
model.layers.31.mlp.down_proj.weight torch.Size([4096, 11008])
model.layers.31.input_layernorm.weight torch.Size([4096])
model.layers.31.post_attention_layernorm.weight torch.Size([4096])
 
model.norm.weight torch.Size([4096])
lm_head.weight torch.Size([32000, 4096])

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/你好赵伟/article/detail/558631