当前位置:   article > 正文

bert模型的bin、ckpt文件分析_bert模型文件的后缀名是什么

bert模型文件的后缀名是什么
  1. import torch
  2. fy=torch.load("pytorch_bert_model.bin" ,map_location=torch.device('cpu'))
  3. for i in fy.keys():
  4. print(i+' '+str(list(fy[i].size())))

输出如下:

  1. bert.embeddings.word_embeddings.weight [28996, 768]
  2. bert.embeddings.position_embeddings.weight [512, 768]
  3. bert.embeddings.token_type_embeddings.weight [2, 768]
  4. bert.embeddings.LayerNorm.weight [768]
  5. bert.embeddings.LayerNorm.bias [768]
  6. bert.encoder.layer.0.attention.self.query.weight [768, 768]
  7. bert.encoder.layer.0.attention.self.query.bias [768]
  8. bert.encoder.layer.0.attention.self.key.weight [768, 768]
  9. bert.encoder.layer.0.attention.self.key.bias [768]
  10. bert.encoder.layer.0.attention.self.value.weight [768, 768]
  11. bert.encoder.layer.0.attention.self.value.bias [768]
  12. bert.encoder.layer.0.attention.output.dense.weight [768, 768]
  13. bert.encoder.layer.0.attention.output.dense.bias [768]
  14. bert.encoder.layer.0.attention.output.LayerNorm.weight [768]
  15. bert.encoder.layer.0.attention.output.LayerNorm.bias [768]
  16. bert.encoder.layer.0.intermediate.dense.weight [3072, 768]
  17. bert.encoder.layer.0.intermediate.dense.bias [3072]
  18. bert.encoder.layer.0.output.dense.weight [768, 3072]
  19. bert.encoder.layer.0.output.dense.bias [768]
  20. bert.encoder.layer.0.output.LayerNorm.weight [768]
  21. bert.encoder.layer.0.output.LayerNorm.bias [768]
  22. bert.encoder.layer.1.attention.self.query.weight [768, 768]
  23. bert.encoder.layer.1.attention.self.query.bias [768]
  24. bert.encoder.layer.1.attention.self.key.weight [768, 768]
  25. bert.encoder.layer.1.attention.self.key.bias [768]
  26. bert.encoder.layer.1.attention.self.value.weight [768, 768]
  27. bert.encoder.layer.1.attention.self.value.bias [768]
  28. bert.encoder.layer.1.attention.output.dense.weight [768, 768]
  29. bert.encoder.layer.1.attention.output.dense.bias [768]
  30. bert.encoder.layer.1.attention.output.LayerNorm.weight [768]
  31. bert.encoder.layer.1.attention.output.LayerNorm.bias [768]
  32. bert.encoder.layer.1.intermediate.dense.weight [3072, 768]
  33. bert.encoder.layer.1.intermediate.dense.bias [3072]
  34. bert.encoder.layer.1.output.dense.weight [768, 3072]
  35. bert.encoder.layer.1.output.dense.bias [768]
  36. bert.encoder.layer.1.output.LayerNorm.weight [768]
  37. bert.encoder.layer.1.output.LayerNorm.bias [768]
  38. bert.encoder.layer.2.attention.self.query.weight [768, 768]
  39. bert.encoder.layer.2.attention.self.query.bias [768]
  40. bert.encoder.layer.2.attention.self.key.weight [768, 768]
  41. bert.encoder.layer.2.attention.self.key.bias [768]
  42. bert.encoder.layer.2.attention.self.value.weight [768, 768]
  43. bert.encoder.layer.2.attention.self.value.bias [768]
  44. bert.encoder.layer.2.attention.output.dense.weight [768, 768]
  45. bert.encoder.layer.2.attention.output.dense.bias [768]
  46. bert.encoder.layer.2.attention.output.LayerNorm.weight [768]
  47. bert.encoder.layer.2.attention.output.LayerNorm.bias [768]
  48. bert.encoder.layer.2.intermediate.dense.weight [3072, 768]
  49. bert.encoder.layer.2.intermediate.dense.bias [3072]
  50. bert.encoder.layer.2.output.dense.weight [768, 3072]
  51. bert.encoder.layer.2.output.dense.bias [768]
  52. bert.encoder.layer.2.output.LayerNorm.weight [768]
  53. bert.encoder.layer.2.output.LayerNorm.bias [768]
  54. bert.encoder.layer.3.attention.self.query.weight [768, 768]
  55. bert.encoder.layer.3.attention.self.query.bias [768]
  56. bert.encoder.layer.3.attention.self.key.weight [768, 768]
  57. bert.encoder.layer.3.attention.self.key.bias [768]
  58. bert.encoder.layer.3.attention.self.value.weight [768, 768]
  59. bert.encoder.layer.3.attention.self.value.bias [768]
  60. bert.encoder.layer.3.attention.output.dense.weight [768, 768]
  61. bert.encoder.layer.3.attention.output.dense.bias [768]
  62. bert.encoder.layer.3.attention.output.LayerNorm.weight [768]
  63. bert.encoder.layer.3.attention.output.LayerNorm.bias [768]
  64. bert.encoder.layer.3.intermediate.dense.weight [3072, 768]
  65. bert.encoder.layer.3.intermediate.dense.bias [3072]
  66. bert.encoder.layer.3.output.dense.weight [768, 3072]
  67. bert.encoder.layer.3.output.dense.bias [768]
  68. bert.encoder.layer.3.output.LayerNorm.weight [768]
  69. bert.encoder.layer.3.output.LayerNorm.bias [768]
  70. bert.encoder.layer.4.attention.self.query.weight [768, 768]
  71. bert.encoder.layer.4.attention.self.query.bias [768]
  72. bert.encoder.layer.4.attention.self.key.weight [768, 768]
  73. bert.encoder.layer.4.attention.self.key.bias [768]
  74. bert.encoder.layer.4.attention.self.value.weight [768, 768]
  75. bert.encoder.layer.4.attention.self.value.bias [768]
  76. bert.encoder.layer.4.attention.output.dense.weight [768, 768]
  77. bert.encoder.layer.4.attention.output.dense.bias [768]
  78. bert.encoder.layer.4.attention.output.LayerNorm.weight [768]
  79. bert.encoder.layer.4.attention.output.LayerNorm.bias [768]
  80. bert.encoder.layer.4.intermediate.dense.weight [3072, 768]
  81. bert.encoder.layer.4.intermediate.dense.bias [3072]
  82. bert.encoder.layer.4.output.dense.weight [768, 3072]
  83. bert.encoder.layer.4.output.dense.bias [768]
  84. bert.encoder.layer.4.output.LayerNorm.weight [768]
  85. bert.encoder.layer.4.output.LayerNorm.bias [768]
  86. bert.encoder.layer.5.attention.self.query.weight [768, 768]
  87. bert.encoder.layer.5.attention.self.query.bias [768]
  88. bert.encoder.layer.5.attention.self.key.weight [768, 768]
  89. bert.encoder.layer.5.attention.self.key.bias [768]
  90. bert.encoder.layer.5.attention.self.value.weight [768, 768]
  91. bert.encoder.layer.5.attention.self.value.bias [768]
  92. bert.encoder.layer.5.attention.output.dense.weight [768, 768]
  93. bert.encoder.layer.5.attention.output.dense.bias [768]
  94. bert.encoder.layer.5.attention.output.LayerNorm.weight [768]
  95. bert.encoder.layer.5.attention.output.LayerNorm.bias [768]
  96. bert.encoder.layer.5.intermediate.dense.weight [3072, 768]
  97. bert.encoder.layer.5.intermediate.dense.bias [3072]
  98. bert.encoder.layer.5.output.dense.weight [768, 3072]
  99. bert.encoder.layer.5.output.dense.bias [768]
  100. bert.encoder.layer.5.output.LayerNorm.weight [768]
  101. bert.encoder.layer.5.output.LayerNorm.bias [768]
  102. bert.encoder.layer.6.attention.self.query.weight [768, 768]
  103. bert.encoder.layer.6.attention.self.query.bias [768]
  104. bert.encoder.layer.6.attention.self.key.weight [768, 768]
  105. bert.encoder.layer.6.attention.self.key.bias [768]
  106. bert.encoder.layer.6.attention.self.value.weight [768, 768]
  107. bert.encoder.layer.6.attention.self.value.bias [768]
  108. bert.encoder.layer.6.attention.output.dense.weight [768, 768]
  109. bert.encoder.layer.6.attention.output.dense.bias [768]
  110. bert.encoder.layer.6.attention.output.LayerNorm.weight [768]
  111. bert.encoder.layer.6.attention.output.LayerNorm.bias [768]
  112. bert.encoder.layer.6.intermediate.dense.weight [3072, 768]
  113. bert.encoder.layer.6.intermediate.dense.bias [3072]
  114. bert.encoder.layer.6.output.dense.weight [768, 3072]
  115. bert.encoder.layer.6.output.dense.bias [768]
  116. bert.encoder.layer.6.output.LayerNorm.weight [768]
  117. bert.encoder.layer.6.output.LayerNorm.bias [768]
  118. bert.encoder.layer.7.attention.self.query.weight [768, 768]
  119. bert.encoder.layer.7.attention.self.query.bias [768]
  120. bert.encoder.layer.7.attention.self.key.weight [768, 768]
  121. bert.encoder.layer.7.attention.self.key.bias [768]
  122. bert.encoder.layer.7.attention.self.value.weight [768, 768]
  123. bert.encoder.layer.7.attention.self.value.bias [768]
  124. bert.encoder.layer.7.attention.output.dense.weight [768, 768]
  125. bert.encoder.layer.7.attention.output.dense.bias [768]
  126. bert.encoder.layer.7.attention.output.LayerNorm.weight [768]
  127. bert.encoder.layer.7.attention.output.LayerNorm.bias [768]
  128. bert.encoder.layer.7.intermediate.dense.weight [3072, 768]
  129. bert.encoder.layer.7.intermediate.dense.bias [3072]
  130. bert.encoder.layer.7.output.dense.weight [768, 3072]
  131. bert.encoder.layer.7.output.dense.bias [768]
  132. bert.encoder.layer.7.output.LayerNorm.weight [768]
  133. bert.encoder.layer.7.output.LayerNorm.bias [768]
  134. bert.encoder.layer.8.attention.self.query.weight [768, 768]
  135. ........
  136. ........
  137. bert.encoder.layer.11.attention.self.query.weight [768, 768]
  138. bert.encoder.layer.11.attention.self.query.bias [768]
  139. bert.encoder.layer.11.attention.self.key.weight [768, 768]
  140. bert.encoder.layer.11.attention.self.key.bias [768]
  141. bert.encoder.layer.11.attention.self.value.weight [768, 768]
  142. bert.encoder.layer.11.attention.self.value.bias [768]
  143. bert.encoder.layer.11.attention.output.dense.weight [768, 768]
  144. bert.encoder.layer.11.attention.output.dense.bias [768]
  145. bert.encoder.layer.11.attention.output.LayerNorm.weight [768]
  146. bert.encoder.layer.11.attention.output.LayerNorm.bias [768]
  147. bert.encoder.layer.11.intermediate.dense.weight [3072, 768]
  148. bert.encoder.layer.11.intermediate.dense.bias [3072]
  149. bert.encoder.layer.11.output.dense.weight [768, 3072]
  150. bert.encoder.layer.11.output.dense.bias [768]
  151. bert.encoder.layer.11.output.LayerNorm.weight [768]
  152. bert.encoder.layer.11.output.LayerNorm.bias [768]
  153. bert.pooler.dense.weight [768, 768]
  154. bert.pooler.dense.bias [768]
  155. classifier.weight [1, 768]
  156. classifier.bias [1]

BERTBASE: L=12, H=768, A=12, 总参数=110M

其中层数(即 Transformer 块个数)表示为 L,将隐藏尺寸(即每层的神经元个数)表示为 H、自注意力头数表示为 A。在所有实验中,将前馈/滤波器尺寸设置为 4H,即 H=768 时为 3072(和intermediate层有关),H=1024 时为 4096。

q、k、v分别代表query,key,value矩阵。把<k, v>看做一个键值对,Attention机制是对输入中元素的Value值进行加权求和,而q和Key用来计算对应Value的权重系数。

bert_base基础版本就有12层,进阶版本有24层。同时它也有很大的前馈神经网络( 768和1024个隐藏层神经元),还有很多attention heads(12-16个)。

bert_base有12个attention head,因此每层encoder的输出维度都是12*64=768

 

源码中Attention后实际的流程是如何的?

  • Transform模块中:在残差连接之前,对output_layer进行了dense+dropout后再合并input_layer进行的layer_norm得到的attention_output
  • 所有attention_output得到并合并后,也是先进行了全连接,而后再进行了dense+dropout再合并的attention_output之后才进行layer_norm得到最终的layer_output

dense层可以简单理解成维度变换

Layner Norm是对一个层的向量做归一化处理,这跟使用ResNet的SkipConnection。前者是序列模型正则化防止covariance shift的手段,后者是避免优化过程中梯度消失的。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/343437
推荐阅读
相关标签
  

闽ICP备14008679号