当前位置:   article > 正文

我在树莓派上跑通了bert模型,使用numpy实现bert模型,使用hugging face 或pytorch训练模型,保存参数为numpy格式,然后使用numpy加载模型推理...

树莓派能跑bert吗

之前分别用numpy实现了mlp,cnn,lstm,这次搞一个大一点的模型bert,纯numpy实现,最重要的是可在树莓派上或其他不能安装pytorch的板子上运行,推理数据

本次模型是随便在hugging face上找的一个新闻评论的模型,7分类

看这些模型参数,这并不重要,模型占硬盘空间都要400+M

  1. bert.embeddings.word_embeddings.weight torch.Size([21128, 768])
  2. bert.embeddings.position_embeddings.weight torch.Size([512, 768])
  3. bert.embeddings.token_type_embeddings.weight torch.Size([2, 768])
  4. bert.embeddings.LayerNorm.weight torch.Size([768])
  5. bert.embeddings.LayerNorm.bias torch.Size([768])
  6. bert.encoder.layer.0.attention.self.query.weight torch.Size([768, 768])
  7. bert.encoder.layer.0.attention.self.query.bias torch.Size([768])
  8. bert.encoder.layer.0.attention.self.key.weight torch.Size([768, 768])
  9. bert.encoder.layer.0.attention.self.key.bias torch.Size([768])
  10. bert.encoder.layer.0.attention.self.value.weight torch.Size([768, 768])
  11. bert.encoder.layer.0.attention.self.value.bias torch.Size([768])
  12. bert.encoder.layer.0.attention.output.dense.weight torch.Size([768, 768])
  13. bert.encoder.layer.0.attention.output.dense.bias torch.Size([768])
  14. bert.encoder.layer.0.attention.output.LayerNorm.weight torch.Size([768])
  15. bert.encoder.layer.0.attention.output.LayerNorm.bias torch.Size([768])
  16. bert.encoder.layer.0.intermediate.dense.weight torch.Size([3072, 768])
  17. bert.encoder.layer.0.intermediate.dense.bias torch.Size([3072])
  18. bert.encoder.layer.0.output.dense.weight torch.Size([768, 3072])
  19. bert.encoder.layer.0.output.dense.bias torch.Size([768])
  20. bert.encoder.layer.0.output.LayerNorm.weight torch.Size([768])
  21. bert.encoder.layer.0.output.LayerNorm.bias torch.Size([768])
  22. bert.encoder.layer.1.attention.self.query.weight torch.Size([768, 768])
  23. bert.encoder.layer.1.attention.self.query.bias torch.Size([768])
  24. bert.encoder.layer.1.attention.self.key.weight torch.Size([768, 768])
  25. bert.encoder.layer.1.attention.self.key.bias torch.Size([768])
  26. bert.encoder.layer.1.attention.self.value.weight torch.Size([768, 768])
  27. bert.encoder.layer.1.attention.self.value.bias torch.Size([768])
  28. bert.encoder.layer.1.attention.output.dense.weight torch.Size([768, 768])
  29. bert.encoder.layer.1.attention.output.dense.bias torch.Size([768])
  30. bert.encoder.layer.1.attention.output.LayerNorm.weight torch.Size([768])
  31. bert.encoder.layer.1.attention.output.LayerNorm.bias torch.Size([768])
  32. bert.encoder.layer.1.intermediate.dense.weight torch.Size([3072, 768])
  33. bert.encoder.layer.1.intermediate.dense.bias torch.Size([3072])
  34. bert.encoder.layer.1.output.dense.weight torch.Size([768, 3072])
  35. bert.encoder.layer.1.output.dense.bias torch.Size([768])
  36. bert.encoder.layer.1.output.LayerNorm.weight torch.Size([768])
  37. bert.encoder.layer.1.output.LayerNorm.bias torch.Size([768])
  38. bert.encoder.layer.2.attention.self.query.weight torch.Size([768, 768])
  39. bert.encoder.layer.2.attention.self.query.bias torch.Size([768])
  40. bert.encoder.layer.2.attention.self.key.weight torch.Size([768, 768])
  41. bert.encoder.layer.2.attention.self.key.bias torch.Size([768])
  42. bert.encoder.layer.2.attention.self.value.weight torch.Size([768, 768])
  43. bert.encoder.layer.2.attention.self.value.bias torch.Size([768])
  44. bert.encoder.layer.2.attention.output.dense.weight torch.Size([768, 768])
  45. bert.encoder.layer.2.attention.output.dense.bias torch.Size([768])
  46. bert.encoder.layer.2.attention.output.LayerNorm.weight torch.Size([768])
  47. bert.encoder.layer.2.attention.output.LayerNorm.bias torch.Size([768])
  48. bert.encoder.layer.2.intermediate.dense.weight torch.Size([3072, 768])
  49. bert.encoder.layer.2.intermediate.dense.bias torch.Size([3072])
  50. bert.encoder.layer.2.output.dense.weight torch.Size([768, 3072])
  51. bert.encoder.layer.2.output.dense.bias torch.Size([768])
  52. bert.encoder.layer.2.output.LayerNorm.weight torch.Size([768])
  53. bert.encoder.layer.2.output.LayerNorm.bias torch.Size([768])
  54. bert.encoder.layer.3.attention.self.query.weight torch.Size([768, 768])
  55. bert.encoder.layer.3.attention.self.query.bias torch.Size([768])
  56. bert.encoder.layer.3.attention.self.key.weight torch.Size([768, 768])
  57. bert.encoder.layer.3.attention.self.key.bias torch.Size([768])
  58. bert.encoder.layer.3.attention.self.value.weight torch.Size([768, 768])
  59. bert.encoder.layer.3.attention.self.value.bias torch.Size([768])
  60. bert.encoder.layer.3.attention.output.dense.weight torch.Size([768, 768])
  61. bert.encoder.layer.3.attention.output.dense.bias torch.Size([768])
  62. bert.encoder.layer.3.attention.output.LayerNorm.weight torch.Size([768])
  63. bert.encoder.layer.3.attention.output.LayerNorm.bias torch.Size([768])
  64. bert.encoder.layer.3.intermediate.dense.weight torch.Size([3072, 768])
  65. bert.encoder.layer.3.intermediate.dense.bias torch.Size([3072])
  66. bert.encoder.layer.3.output.dense.weight torch.Size([768, 3072])
  67. bert.encoder.layer.3.output.dense.bias torch.Size([768])
  68. bert.encoder.layer.3.output.LayerNorm.weight torch.Size([768])
  69. bert.encoder.layer.3.output.LayerNorm.bias torch.Size([768])
  70. bert.encoder.layer.4.attention.self.query.weight torch.Size([768, 768])
  71. bert.encoder.layer.4.attention.self.query.bias torch.Size([768])
  72. bert.encoder.layer.4.attention.self.key.weight torch.Size([768, 768])
  73. bert.encoder.layer.4.attention.self.key.bias torch.Size([768])
  74. bert.encoder.layer.4.attention.self.value.weight torch.Size([768, 768])
  75. bert.encoder.layer.4.attention.self.value.bias torch.Size([768])
  76. bert.encoder.layer.4.attention.output.dense.weight torch.Size([768, 768])
  77. bert.encoder.layer.4.attention.output.dense.bias torch.Size([768])
  78. bert.encoder.layer.4.attention.output.LayerNorm.weight torch.Size([768])
  79. bert.encoder.layer.4.attention.output.LayerNorm.bias torch.Size([768])
  80. bert.encoder.layer.4.intermediate.dense.weight torch.Size([3072, 768])
  81. bert.encoder.layer.4.intermediate.dense.bias torch.Size([3072])
  82. bert.encoder.layer.4.output.dense.weight torch.Size([768, 3072])
  83. bert.encoder.layer.4.output.dense.bias torch.Size([768])
  84. bert.encoder.layer.4.output.LayerNorm.weight torch.Size([768])
  85. bert.encoder.layer.4.output.LayerNorm.bias torch.Size([768])
  86. bert.encoder.layer.5.attention.self.query.weight torch.Size([768, 768])
  87. bert.encoder.layer.5.attention.self.query.bias torch.Size([768])
  88. bert.encoder.layer.5.attention.self.key.weight torch.Size([768, 768])
  89. bert.encoder.layer.5.attention.self.key.bias torch.Size([768])
  90. bert.encoder.layer.5.attention.self.value.weight torch.Size([768, 768])
  91. bert.encoder.layer.5.attention.self.value.bias torch.Size([768])
  92. bert.encoder.layer.5.attention.output.dense.weight torch.Size([768, 768])
  93. bert.encoder.layer.5.attention.output.dense.bias torch.Size([768])
  94. bert.encoder.layer.5.attention.output.LayerNorm.weight torch.Size([768])
  95. bert.encoder.layer.5.attention.output.LayerNorm.bias torch.Size([768])
  96. bert.encoder.layer.5.intermediate.dense.weight torch.Size([3072, 768])
  97. bert.encoder.layer.5.intermediate.dense.bias torch.Size([3072])
  98. bert.encoder.layer.5.output.dense.weight torch.Size([768, 3072])
  99. bert.encoder.layer.5.output.dense.bias torch.Size([768])
  100. bert.encoder.layer.5.output.LayerNorm.weight torch.Size([768])
  101. bert.encoder.layer.5.output.LayerNorm.bias torch.Size([768])
  102. bert.encoder.layer.6.attention.self.query.weight torch.Size([768, 768])
  103. bert.encoder.layer.6.attention.self.query.bias torch.Size([768])
  104. bert.encoder.layer.6.attention.self.key.weight torch.Size([768, 768])
  105. bert.encoder.layer.6.attention.self.key.bias torch.Size([768])
  106. bert.encoder.layer.6.attention.self.value.weight torch.Size([768, 768])
  107. bert.encoder.layer.6.attention.self.value.bias torch.Size([768])
  108. bert.encoder.layer.6.attention.output.dense.weight torch.Size([768, 768])
  109. bert.encoder.layer.6.attention.output.dense.bias torch.Size([768])
  110. bert.encoder.layer.6.attention.output.LayerNorm.weight torch.Size([768])
  111. bert.encoder.layer.6.attention.output.LayerNorm.bias torch.Size([768])
  112. bert.encoder.layer.6.intermediate.dense.weight torch.Size([3072, 768])
  113. bert.encoder.layer.6.intermediate.dense.bias torch.Size([3072])
  114. bert.encoder.layer.6.output.dense.weight torch.Size([768, 3072])
  115. bert.encoder.layer.6.output.dense.bias torch.Size([768])
  116. bert.encoder.layer.6.output.LayerNorm.weight torch.Size([768])
  117. bert.encoder.layer.6.output.LayerNorm.bias torch.Size([768])
  118. bert.encoder.layer.7.attention.self.query.weight torch.Size([768, 768])
  119. bert.encoder.layer.7.attention.self.query.bias torch.Size([768])
  120. bert.encoder.layer.7.attention.self.key.weight torch.Size([768, 768])
  121. bert.encoder.layer.7.attention.self.key.bias torch.Size([768])
  122. bert.encoder.layer.7.attention.self.value.weight torch.Size([768, 768])
  123. bert.encoder.layer.7.attention.self.value.bias torch.Size([768])
  124. bert.encoder.layer.7.attention.output.dense.weight torch.Size([768, 768])
  125. bert.encoder.layer.7.attention.output.dense.bias torch.Size([768])
  126. bert.encoder.layer.7.attention.output.LayerNorm.weight torch.Size([768])
  127. bert.encoder.layer.7.attention.output.LayerNorm.bias torch.Size([768])
  128. bert.encoder.layer.7.intermediate.dense.weight torch.Size([3072, 768])
  129. bert.encoder.layer.7.intermediate.dense.bias torch.Size([3072])
  130. bert.encoder.layer.7.output.dense.weight torch.Size([768, 3072])
  131. bert.encoder.layer.7.output.dense.bias torch.Size([768])
  132. bert.encoder.layer.7.output.LayerNorm.weight torch.Size([768])
  133. bert.encoder.layer.7.output.LayerNorm.bias torch.Size([768])
  134. bert.encoder.layer.8.attention.self.query.weight torch.Size([768, 768])
  135. bert.encoder.layer.8.attention.self.query.bias torch.Size([768])
  136. bert.encoder.layer.8.attention.self.key.weight torch.Size([768, 768])
  137. bert.encoder.layer.8.attention.self.key.bias torch.Size([768])
  138. bert.encoder.layer.8.attention.self.value.weight torch.Size([768, 768])
  139. bert.encoder.layer.8.attention.self.value.bias torch.Size([768])
  140. bert.encoder.layer.8.attention.output.dense.weight torch.Size([768, 768])
  141. bert.encoder.layer.8.attention.output.dense.bias torch.Size([768])
  142. bert.encoder.layer.8.attention.output.LayerNorm.weight torch.Size([768])
  143. bert.encoder.layer.8.attention.output.LayerNorm.bias torch.Size([768])
  144. bert.encoder.layer.8.intermediate.dense.weight torch.Size([3072, 768])
  145. bert.encoder.layer.8.intermediate.dense.bias torch.Size([3072])
  146. bert.encoder.layer.8.output.dense.weight torch.Size([768, 3072])
  147. bert.encoder.layer.8.output.dense.bias torch.Size([768])
  148. bert.encoder.layer.8.output.LayerNorm.weight torch.Size([768])
  149. bert.encoder.layer.8.output.LayerNorm.bias torch.Size([768])
  150. bert.encoder.layer.9.attention.self.query.weight torch.Size([768, 768])
  151. bert.encoder.layer.9.attention.self.query.bias torch.Size([768])
  152. bert.encoder.layer.9.attention.self.key.weight torch.Size([768, 768])
  153. bert.encoder.layer.9.attention.self.key.bias torch.Size([768])
  154. bert.encoder.layer.9.attention.self.value.weight torch.Size([768, 768])
  155. bert.encoder.layer.9.attention.self.value.bias torch.Size([768])
  156. bert.encoder.layer.9.attention.output.dense.weight torch.Size([768, 768])
  157. bert.encoder.layer.9.attention.output.dense.bias torch.Size([768])
  158. bert.encoder.layer.9.attention.output.LayerNorm.weight torch.Size([768])
  159. bert.encoder.layer.9.attention.output.LayerNorm.bias torch.Size([768])
  160. bert.encoder.layer.9.intermediate.dense.weight torch.Size([3072, 768])
  161. bert.encoder.layer.9.intermediate.dense.bias torch.Size([3072])
  162. bert.encoder.layer.9.output.dense.weight torch.Size([768, 3072])
  163. bert.encoder.layer.9.output.dense.bias torch.Size([768])
  164. bert.encoder.layer.9.output.LayerNorm.weight torch.Size([768])
  165. bert.encoder.layer.9.output.LayerNorm.bias torch.Size([768])
  166. bert.encoder.layer.10.attention.self.query.weight torch.Size([768, 768])
  167. bert.encoder.layer.10.attention.self.query.bias torch.Size([768])
  168. bert.encoder.layer.10.attention.self.key.weight torch.Size([768, 768])
  169. bert.encoder.layer.10.attention.self.key.bias torch.Size([768])
  170. bert.encoder.layer.10.attention.self.value.weight torch.Size([768, 768])
  171. bert.encoder.layer.10.attention.self.value.bias torch.Size([768])
  172. bert.encoder.layer.10.attention.output.dense.weight torch.Size([768, 768])
  173. bert.encoder.layer.10.attention.output.dense.bias torch.Size([768])
  174. bert.encoder.layer.10.attention.output.LayerNorm.weight torch.Size([768])
  175. bert.encoder.layer.10.attention.output.LayerNorm.bias torch.Size([768])
  176. bert.encoder.layer.10.intermediate.dense.weight torch.Size([3072, 768])
  177. bert.encoder.layer.10.intermediate.dense.bias torch.Size([3072])
  178. bert.encoder.layer.10.output.dense.weight torch.Size([768, 3072])
  179. bert.encoder.layer.10.output.dense.bias torch.Size([768])
  180. bert.encoder.layer.10.output.LayerNorm.weight torch.Size([768])
  181. bert.encoder.layer.10.output.LayerNorm.bias torch.Size([768])
  182. bert.encoder.layer.11.attention.self.query.weight torch.Size([768, 768])
  183. bert.encoder.layer.11.attention.self.query.bias torch.Size([768])
  184. bert.encoder.layer.11.attention.self.key.weight torch.Size([768, 768])
  185. bert.encoder.layer.11.attention.self.key.bias torch.Size([768])
  186. bert.encoder.layer.11.attention.self.value.weight torch.Size([768, 768])
  187. bert.encoder.layer.11.attention.self.value.bias torch.Size([768])
  188. bert.encoder.layer.11.attention.output.dense.weight torch.Size([768, 768])
  189. bert.encoder.layer.11.attention.output.dense.bias torch.Size([768])
  190. bert.encoder.layer.11.attention.output.LayerNorm.weight torch.Size([768])
  191. bert.encoder.layer.11.attention.output.LayerNorm.bias torch.Size([768])
  192. bert.encoder.layer.11.intermediate.dense.weight torch.Size([3072, 768])
  193. bert.encoder.layer.11.intermediate.dense.bias torch.Size([3072])
  194. bert.encoder.layer.11.output.dense.weight torch.Size([768, 3072])
  195. bert.encoder.layer.11.output.dense.bias torch.Size([768])
  196. bert.encoder.layer.11.output.LayerNorm.weight torch.Size([768])
  197. bert.encoder.layer.11.output.LayerNorm.bias torch.Size([768])
  198. bert.pooler.dense.weight torch.Size([768, 768])
  199. bert.pooler.dense.bias torch.Size([768])
  200. classifier.weight torch.Size([7, 768])
  201. classifier.bias torch.Size([7])

为了实现numpy的bert模型,踩了两天的坑,一步步对比huggingface源码实现的,真的太难了~~~

这是使用numpy实现的bert代码,分数上和huggingface有稍微的一点点区别,可能是模型太大,保存的模型参数误差累计造成的!

看下面的代码真的有利于直接了解bert模型结构,各种细节简单又到位,自己都服自己,研究这个东西~~~

  1. import numpy as np
  2. def word_embedding(input_ids, word_embeddings):
  3. return word_embeddings[input_ids]
  4. def position_embedding(position_ids, position_embeddings):
  5. return position_embeddings[position_ids]
  6. def token_type_embedding(token_type_ids, token_type_embeddings):
  7. return token_type_embeddings[token_type_ids]
  8. def softmax(x, axis=None):
  9. # e_x = np.exp(x).astype(np.float32) #
  10. e_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
  11. sum_ex = np.sum(e_x, axis=axis,keepdims=True).astype(np.float32)
  12. return e_x / sum_ex
  13. def scaled_dot_product_attention(Q, K, V, mask=None):
  14. d_k = Q.shape[-1]
  15. scores = np.matmul(Q, K.transpose(0, 2, 1)) / np.sqrt(d_k)
  16. if mask is not None:
  17. scores = np.where(mask, scores, np.full_like(scores, -np.inf))
  18. attention_weights = softmax(scores, axis=-1)
  19. # print(attention_weights)
  20. # print(np.sum(attention_weights,axis=-1))
  21. output = np.matmul(attention_weights, V)
  22. return output, attention_weights
  23. def multihead_attention(input, num_heads,W_Q,B_Q,W_K,B_K,W_V,B_V,W_O,B_O):
  24. q = np.matmul(input, W_Q.T)+B_Q
  25. k = np.matmul(input, W_K.T)+B_K
  26. v = np.matmul(input, W_V.T)+B_V
  27. # 分割输入为多个头
  28. q = np.split(q, num_heads, axis=-1)
  29. k = np.split(k, num_heads, axis=-1)
  30. v = np.split(v, num_heads, axis=-1)
  31. outputs = []
  32. for q_,k_,v_ in zip(q,k,v):
  33. output, attention_weights = scaled_dot_product_attention(q_, k_, v_)
  34. outputs.append(output)
  35. outputs = np.concatenate(outputs, axis=-1)
  36. outputs = np.matmul(outputs, W_O.T)+B_O
  37. return outputs
  38. def layer_normalization(x, weight, bias, eps=1e-12):
  39. mean = np.mean(x, axis=-1, keepdims=True)
  40. variance = np.var(x, axis=-1, keepdims=True)
  41. std = np.sqrt(variance + eps)
  42. normalized_x = (x - mean) / std
  43. output = weight * normalized_x + bias
  44. return output
  45. def feed_forward_layer(inputs, weight, bias, activation='relu'):
  46. linear_output = np.matmul(inputs,weight) + bias
  47. if activation == 'relu':
  48. activated_output = np.maximum(0, linear_output) # ReLU激活函数
  49. elif activation == 'gelu':
  50. activated_output = 0.5 * linear_output * (1 + np.tanh(np.sqrt(2 / np.pi) * (linear_output + 0.044715 * np.power(linear_output, 3)))) # GELU激活函数
  51. elif activation == "tanh" :
  52. activated_output = np.tanh(linear_output)
  53. else:
  54. activated_output = linear_output # 无激活函数
  55. return activated_output
  56. def residual_connection(inputs, residual):
  57. # 残差连接
  58. residual_output = inputs + residual
  59. return residual_output
  60. def tokenize_sentence(sentence, vocab_file = 'vocab.txt'):
  61. with open(vocab_file, 'r', encoding='utf-8') as f:
  62. vocab = f.readlines()
  63. vocab = [i.strip() for i in vocab]
  64. # print(len(vocab))
  65. tokenized_sentence = ['[CLS]'] + list(sentence) + ["[SEP]"] # 在句子开头添加[cls]
  66. token_ids = [vocab.index(token) for token in tokenized_sentence]
  67. return token_ids
  68. # 加载保存的模型数据
  69. model_data = np.load('bert_model_params.npz')
  70. word_embeddings = model_data["bert.embeddings.word_embeddings.weight"]
  71. position_embeddings = model_data["bert.embeddings.position_embeddings.weight"]
  72. token_type_embeddings = model_data["bert.embeddings.token_type_embeddings.weight"]
  73. def model_input(sentence):
  74. token_ids = tokenize_sentence(sentence)
  75. input_ids = np.array(token_ids) # 输入的词汇id
  76. word_embedded = word_embedding(input_ids, word_embeddings)
  77. position_ids = np.array(range(len(input_ids))) # 位置id
  78. # 位置嵌入矩阵,形状为 (max_position, embedding_size)
  79. position_embedded = position_embedding(position_ids, position_embeddings)
  80. token_type_ids = np.array([0]*len(input_ids)) # 片段类型id
  81. # 片段类型嵌入矩阵,形状为 (num_token_types, embedding_size)
  82. token_type_embedded = token_type_embedding(token_type_ids, token_type_embeddings)
  83. embedding_output = np.expand_dims(word_embedded + position_embedded + token_type_embedded, axis=0)
  84. return embedding_output
  85. def bert(input,num_heads):
  86. ebd_LayerNorm_weight = model_data['bert.embeddings.LayerNorm.weight']
  87. ebd_LayerNorm_bias = model_data['bert.embeddings.LayerNorm.bias']
  88. input = layer_normalization(input,ebd_LayerNorm_weight,ebd_LayerNorm_bias) #这里和模型输出一致
  89. for i in range(12):
  90. # 调用多头自注意力函数
  91. W_Q = model_data['bert.encoder.layer.{}.attention.self.query.weight'.format(i)]
  92. B_Q = model_data['bert.encoder.layer.{}.attention.self.query.bias'.format(i)]
  93. W_K = model_data['bert.encoder.layer.{}.attention.self.key.weight'.format(i)]
  94. B_K = model_data['bert.encoder.layer.{}.attention.self.key.bias'.format(i)]
  95. W_V = model_data['bert.encoder.layer.{}.attention.self.value.weight'.format(i)]
  96. B_V = model_data['bert.encoder.layer.{}.attention.self.value.bias'.format(i)]
  97. W_O = model_data['bert.encoder.layer.{}.attention.output.dense.weight'.format(i)]
  98. B_O = model_data['bert.encoder.layer.{}.attention.output.dense.bias'.format(i)]
  99. attention_output_LayerNorm_weight = model_data['bert.encoder.layer.{}.attention.output.LayerNorm.weight'.format(i)]
  100. attention_output_LayerNorm_bias = model_data['bert.encoder.layer.{}.attention.output.LayerNorm.bias'.format(i)]
  101. intermediate_weight = model_data['bert.encoder.layer.{}.intermediate.dense.weight'.format(i)]
  102. intermediate_bias = model_data['bert.encoder.layer.{}.intermediate.dense.bias'.format(i)]
  103. dense_weight = model_data['bert.encoder.layer.{}.output.dense.weight'.format(i)]
  104. dense_bias = model_data['bert.encoder.layer.{}.output.dense.bias'.format(i)]
  105. output_LayerNorm_weight = model_data['bert.encoder.layer.{}.output.LayerNorm.weight'.format(i)]
  106. output_LayerNorm_bias = model_data['bert.encoder.layer.{}.output.LayerNorm.bias'.format(i)]
  107. output = multihead_attention(input, num_heads,W_Q,B_Q,W_K,B_K,W_V,B_V,W_O,B_O)
  108. output = residual_connection(input,output)
  109. output1 = layer_normalization(output,attention_output_LayerNorm_weight,attention_output_LayerNorm_bias) #这里和模型输出一致
  110. output = feed_forward_layer(output1, intermediate_weight.T, intermediate_bias, activation='gelu')
  111. output = feed_forward_layer(output, dense_weight.T, dense_bias, activation='')
  112. output = residual_connection(output1,output)
  113. output2 = layer_normalization(output,output_LayerNorm_weight,output_LayerNorm_bias) #一致
  114. input = output2
  115. bert_pooler_dense_weight = model_data['bert.pooler.dense.weight']
  116. bert_pooler_dense_bias = model_data['bert.pooler.dense.bias']
  117. output = feed_forward_layer(output2, bert_pooler_dense_weight.T, bert_pooler_dense_bias, activation='tanh') #一致
  118. return output
  119. # for i in model_data:
  120. # # print(i)
  121. # print(i,model_data[i].shape)
  122. id2label = {0: 'mainland China politics', 1: 'Hong Kong - Macau politics', 2: 'International news', 3: 'financial news', 4: 'culture', 5: 'entertainment', 6: 'sports'}
  123. classifier_weight = model_data['classifier.weight']
  124. classifier_bias = model_data['classifier.bias']
  125. if __name__ == "__main__":
  126. sentences = ["马拉松比赛","香港有群众游行示威","党中央决定制定爱国教育法","俄罗斯和欧美对抗","人民币汇率贬值","端午节吃粽子","大妈们跳广场舞"]
  127. while True:
  128. # 示例用法
  129. for sentence in sentences:
  130. # print(model_input(sentence).shape)
  131. output = bert(model_input(sentence),num_heads=12)
  132. # print(output)
  133. output = feed_forward_layer(output[:,0,:], classifier_weight.T, classifier_bias, activation='')
  134. # print(output)
  135. output = softmax(output,axis=-1)
  136. label_id = np.argmax(output,axis=-1)
  137. label_score = output[0][label_id]
  138. print("sentence:",sentence,"\tlabels:",id2label[label_id[0]],"\tscore:",label_score)

这是hugging face上找的一个别人训练好的模型,roberta模型作新闻7分类,并且保存模型结构为numpy格式,为了上面的代码加载

  1. import numpy as np
  2. from transformers import AutoModelForSequenceClassification,AutoTokenizer,pipeline
  3. model = AutoModelForSequenceClassification.from_pretrained('uer/roberta-base-finetuned-chinanews-chinese')
  4. tokenizer = AutoTokenizer.from_pretrained('uer/roberta-base-finetuned-chinanews-chinese')
  5. text_classification = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
  6. print(text_classification("马拉松决赛"))
  7. # print(model)
  8. # 打印BERT模型的权重维度
  9. for name, param in model.named_parameters():
  10. print(name, param.data.shape)
  11. # # # 保存模型参数为NumPy格式
  12. model_params = {name: param.data.cpu().numpy() for name, param in model.named_parameters()}
  13. np.savez('bert_model_params.npz', **model_params)
  14. # model_params

对比两个结果:

  1. hugging face[{'label': 'sports', 'score': 0.9929242134094238}]
  2. numpysports [0.9928773]

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号