赞
踩
"attention_probs_dropout_prob": 0.1, #乘法attention时,softmax后dropout概率
"directionality": "bidi", "hidden_act": "gelu", # 激活函数 高斯误差线性单元
"hidden_dropout_prob": 0.1, # 隐藏层dropout概率
"hidden_size": 768, # 隐藏单元数
"initializer_range": 0.02, # 权重初始化range
"intermediate_size": 3072, # 升维维度 前馈全连接层维度768-3072-768
"max_position_embeddings": 512, # 最大序列长度,比真实的大的多,但不能减
"num_attention_heads": 12, # #在encoder层中的注意头个数
"num_hidden_layers": 12, # 隐藏层数
"pooler_fc_size": 768, # 【CLS】张量维度
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2, # segment imbadding
"vocab_size": 21128 # 词汇数
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。