当前位置:   article > 正文

麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录_transformer.h.7.attn.c_attn.bias

transformer.h.7.attn.c_attn.bias

1. 查看系统版本

uname -a

Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

2. 查看显卡

npu-smi info

前情提要:

官网给出支持昇腾910架构,刚好有300I资源,测试一下,给大家提供参考~~菜鸟一枚还需向各位大佬学习

https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-supporticon-default.png?t=N7T8https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-support主要测试参考该方法,暂时不做深入研究。 

暂时了解 该系统可以做简单的算法模型,主要是架构不同,需要重新写算法,可以安装pytorch、tensorflow和mindformers等。

查看具体参数:

uname -m && cat /etc/*release

 

  1. aarch64
  2. Kylin Linux Advanced Server release V10 (Sword)
  3. NAME="Kylin Linux Advanced Server"
  4. VERSION="V10 (Sword)"
  5. ID="kylin"
  6. VERSION_ID="V10"
  7. PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
  8. ANSI_COLOR="0;31"
  9. Kylin Linux Advanced Server release V10 (Sword)

3. 配置docker,有两种配置方法,一种在官网下载,一种直接用命令yum 安装即可

4. 安装minconda ,注意安装arrch64版本即可

5.按照教程配置,这里不做详细介绍了,直接给出记录

6.没有使用教程启动docker的命令,使用以下命令。

sudo docker run -it --rm -u root --network=host --ipc=host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2  --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 --name=6bff46b104b8 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /etc/ascend_install.info:/etc/ascend_install.info  -v /root/qwen/Qwen-7B-Chat:/data/qwen/models/Qwen-7B-Chat -v /var/log/npu/:/usr/slog  qwenllm/qwen-mindspore /bin/bash

成功启动docker。

7.转换模型

python3 /data/qwen/mindformers/research/qwen/convert_weight.py

  1. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  2. setattr(self, word, getattr(machar, word).flat[0])
  3. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  4. return self._float_to_str(self.smallest_subnormal)
  5. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  6. setattr(self, word, getattr(machar, word).flat[0])
  7. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  8. return self._float_to_str(self.smallest_subnormal)
  9. Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
  10. Flash attention will be disabled because it does NOT support fp32.
  11. Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
  12. Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
  13. Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
  14. Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????| 8/8 [00:03<00:00, 2.35it/s]
  15. Parameter (name=transformer.wte.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
  16. name: transformer.wte.weight->transformer.wte.embedding_weight
  17. Parameter (name=transformer.h.0.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  18. name: transformer.h.0.ln_1.weight->transformer.layers.0.attention_norm.weight
  19. Parameter (name=transformer.h.0.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  20. name: transformer.h.0.attn.c_attn.weight->transformer.layers.0.attn.c_attn.weight
  21. Parameter (name=transformer.h.0.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  22. name: transformer.h.0.attn.c_attn.bias->transformer.layers.0.attn.c_attn.bias
  23. Parameter (name=transformer.h.0.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  24. name: transformer.h.0.attn.c_proj.weight->transformer.layers.0.attention.wo.weight
  25. Parameter (name=transformer.h.0.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  26. name: transformer.h.0.ln_2.weight->transformer.layers.0.ffn_norm.weight
  27. Parameter (name=transformer.h.0.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  28. name: transformer.h.0.mlp.w1.weight->transformer.layers.0.feed_forward.w1.weight
  29. Parameter (name=transformer.h.0.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  30. name: transformer.h.0.mlp.w2.weight->transformer.layers.0.feed_forward.w3.weight
  31. Parameter (name=transformer.h.0.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  32. name: transformer.h.0.mlp.c_proj.weight->transformer.layers.0.feed_forward.w2.weight
  33. Parameter (name=transformer.h.1.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  34. name: transformer.h.1.ln_1.weight->transformer.layers.1.attention_norm.weight
  35. Parameter (name=transformer.h.1.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  36. name: transformer.h.1.attn.c_attn.weight->transformer.layers.1.attn.c_attn.weight
  37. Parameter (name=transformer.h.1.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  38. name: transformer.h.1.attn.c_attn.bias->transformer.layers.1.attn.c_attn.bias
  39. Parameter (name=transformer.h.1.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  40. name: transformer.h.1.attn.c_proj.weight->transformer.layers.1.attention.wo.weight
  41. Parameter (name=transformer.h.1.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  42. name: transformer.h.1.ln_2.weight->transformer.layers.1.ffn_norm.weight
  43. Parameter (name=transformer.h.1.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  44. name: transformer.h.1.mlp.w1.weight->transformer.layers.1.feed_forward.w1.weight
  45. Parameter (name=transformer.h.1.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  46. name: transformer.h.1.mlp.w2.weight->transformer.layers.1.feed_forward.w3.weight
  47. Parameter (name=transformer.h.1.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  48. name: transformer.h.1.mlp.c_proj.weight->transformer.layers.1.feed_forward.w2.weight
  49. Parameter (name=transformer.h.2.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  50. name: transformer.h.2.ln_1.weight->transformer.layers.2.attention_norm.weight
  51. Parameter (name=transformer.h.2.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  52. name: transformer.h.2.attn.c_attn.weight->transformer.layers.2.attn.c_attn.weight
  53. Parameter (name=transformer.h.2.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  54. name: transformer.h.2.attn.c_attn.bias->transformer.layers.2.attn.c_attn.bias
  55. Parameter (name=transformer.h.2.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  56. name: transformer.h.2.attn.c_proj.weight->transformer.layers.2.attention.wo.weight
  57. Parameter (name=transformer.h.2.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  58. name: transformer.h.2.ln_2.weight->transformer.layers.2.ffn_norm.weight
  59. Parameter (name=transformer.h.2.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  60. name: transformer.h.2.mlp.w1.weight->transformer.layers.2.feed_forward.w1.weight
  61. Parameter (name=transformer.h.2.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  62. name: transformer.h.2.mlp.w2.weight->transformer.layers.2.feed_forward.w3.weight
  63. Parameter (name=transformer.h.2.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  64. name: transformer.h.2.mlp.c_proj.weight->transformer.layers.2.feed_forward.w2.weight
  65. Parameter (name=transformer.h.3.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  66. name: transformer.h.3.ln_1.weight->transformer.layers.3.attention_norm.weight
  67. Parameter (name=transformer.h.3.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  68. name: transformer.h.3.attn.c_attn.weight->transformer.layers.3.attn.c_attn.weight
  69. Parameter (name=transformer.h.3.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  70. name: transformer.h.3.attn.c_attn.bias->transformer.layers.3.attn.c_attn.bias
  71. Parameter (name=transformer.h.3.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  72. name: transformer.h.3.attn.c_proj.weight->transformer.layers.3.attention.wo.weight
  73. Parameter (name=transformer.h.3.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  74. name: transformer.h.3.ln_2.weight->transformer.layers.3.ffn_norm.weight
  75. Parameter (name=transformer.h.3.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  76. name: transformer.h.3.mlp.w1.weight->transformer.layers.3.feed_forward.w1.weight
  77. Parameter (name=transformer.h.3.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  78. name: transformer.h.3.mlp.w2.weight->transformer.layers.3.feed_forward.w3.weight
  79. Parameter (name=transformer.h.3.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  80. name: transformer.h.3.mlp.c_proj.weight->transformer.layers.3.feed_forward.w2.weight
  81. Parameter (name=transformer.h.4.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  82. name: transformer.h.4.ln_1.weight->transformer.layers.4.attention_norm.weight
  83. Parameter (name=transformer.h.4.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  84. name: transformer.h.4.attn.c_attn.weight->transformer.layers.4.attn.c_attn.weight
  85. Parameter (name=transformer.h.4.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  86. name: transformer.h.4.attn.c_attn.bias->transformer.layers.4.attn.c_attn.bias
  87. Parameter (name=transformer.h.4.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  88. name: transformer.h.4.attn.c_proj.weight->transformer.layers.4.attention.wo.weight
  89. Parameter (name=transformer.h.4.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  90. name: transformer.h.4.ln_2.weight->transformer.layers.4.ffn_norm.weight
  91. Parameter (name=transformer.h.4.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  92. name: transformer.h.4.mlp.w1.weight->transformer.layers.4.feed_forward.w1.weight
  93. Parameter (name=transformer.h.4.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  94. name: transformer.h.4.mlp.w2.weight->transformer.layers.4.feed_forward.w3.weight
  95. Parameter (name=transformer.h.4.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  96. name: transformer.h.4.mlp.c_proj.weight->transformer.layers.4.feed_forward.w2.weight
  97. Parameter (name=transformer.h.5.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  98. name: transformer.h.5.ln_1.weight->transformer.layers.5.attention_norm.weight
  99. Parameter (name=transformer.h.5.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  100. name: transformer.h.5.attn.c_attn.weight->transformer.layers.5.attn.c_attn.weight
  101. Parameter (name=transformer.h.5.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  102. name: transformer.h.5.attn.c_attn.bias->transformer.layers.5.attn.c_attn.bias
  103. Parameter (name=transformer.h.5.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  104. name: transformer.h.5.attn.c_proj.weight->transformer.layers.5.attention.wo.weight
  105. Parameter (name=transformer.h.5.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  106. name: transformer.h.5.ln_2.weight->transformer.layers.5.ffn_norm.weight
  107. Parameter (name=transformer.h.5.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  108. name: transformer.h.5.mlp.w1.weight->transformer.layers.5.feed_forward.w1.weight
  109. Parameter (name=transformer.h.5.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  110. name: transformer.h.5.mlp.w2.weight->transformer.layers.5.feed_forward.w3.weight
  111. Parameter (name=transformer.h.5.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  112. name: transformer.h.5.mlp.c_proj.weight->transformer.layers.5.feed_forward.w2.weight
  113. Parameter (name=transformer.h.6.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  114. name: transformer.h.6.ln_1.weight->transformer.layers.6.attention_norm.weight
  115. Parameter (name=transformer.h.6.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  116. name: transformer.h.6.attn.c_attn.weight->transformer.layers.6.attn.c_attn.weight
  117. Parameter (name=transformer.h.6.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  118. name: transformer.h.6.attn.c_attn.bias->transformer.layers.6.attn.c_attn.bias
  119. Parameter (name=transformer.h.6.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  120. name: transformer.h.6.attn.c_proj.weight->transformer.layers.6.attention.wo.weight
  121. Parameter (name=transformer.h.6.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  122. name: transformer.h.6.ln_2.weight->transformer.layers.6.ffn_norm.weight
  123. Parameter (name=transformer.h.6.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  124. name: transformer.h.6.mlp.w1.weight->transformer.layers.6.feed_forward.w1.weight
  125. Parameter (name=transformer.h.6.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  126. name: transformer.h.6.mlp.w2.weight->transformer.layers.6.feed_forward.w3.weight
  127. Parameter (name=transformer.h.6.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  128. name: transformer.h.6.mlp.c_proj.weight->transformer.layers.6.feed_forward.w2.weight
  129. Parameter (name=transformer.h.7.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  130. name: transformer.h.7.ln_1.weight->transformer.layers.7.attention_norm.weight
  131. Parameter (name=transformer.h.7.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  132. name: transformer.h.7.attn.c_attn.weight->transformer.layers.7.attn.c_attn.weight
  133. Parameter (name=transformer.h.7.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  134. name: transformer.h.7.attn.c_attn.bias->transformer.layers.7.attn.c_attn.bias
  135. Parameter (name=transformer.h.7.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  136. name: transformer.h.7.attn.c_proj.weight->transformer.layers.7.attention.wo.weight
  137. Parameter (name=transformer.h.7.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  138. name: transformer.h.7.ln_2.weight->transformer.layers.7.ffn_norm.weight
  139. Parameter (name=transformer.h.7.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  140. name: transformer.h.7.mlp.w1.weight->transformer.layers.7.feed_forward.w1.weight
  141. Parameter (name=transformer.h.7.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  142. name: transformer.h.7.mlp.w2.weight->transformer.layers.7.feed_forward.w3.weight
  143. Parameter (name=transformer.h.7.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  144. name: transformer.h.7.mlp.c_proj.weight->transformer.layers.7.feed_forward.w2.weight
  145. Parameter (name=transformer.h.8.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  146. name: transformer.h.8.ln_1.weight->transformer.layers.8.attention_norm.weight
  147. Parameter (name=transformer.h.8.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  148. name: transformer.h.8.attn.c_attn.weight->transformer.layers.8.attn.c_attn.weight
  149. Parameter (name=transformer.h.8.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  150. name: transformer.h.8.attn.c_attn.bias->transformer.layers.8.attn.c_attn.bias
  151. Parameter (name=transformer.h.8.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  152. name: transformer.h.8.attn.c_proj.weight->transformer.layers.8.attention.wo.weight
  153. Parameter (name=transformer.h.8.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  154. name: transformer.h.8.ln_2.weight->transformer.layers.8.ffn_norm.weight
  155. Parameter (name=transformer.h.8.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  156. name: transformer.h.8.mlp.w1.weight->transformer.layers.8.feed_forward.w1.weight
  157. Parameter (name=transformer.h.8.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  158. name: transformer.h.8.mlp.w2.weight->transformer.layers.8.feed_forward.w3.weight
  159. Parameter (name=transformer.h.8.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  160. name: transformer.h.8.mlp.c_proj.weight->transformer.layers.8.feed_forward.w2.weight
  161. Parameter (name=transformer.h.9.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  162. name: transformer.h.9.ln_1.weight->transformer.layers.9.attention_norm.weight
  163. Parameter (name=transformer.h.9.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  164. name: transformer.h.9.attn.c_attn.weight->transformer.layers.9.attn.c_attn.weight
  165. Parameter (name=transformer.h.9.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  166. name: transformer.h.9.attn.c_attn.bias->transformer.layers.9.attn.c_attn.bias
  167. Parameter (name=transformer.h.9.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  168. name: transformer.h.9.attn.c_proj.weight->transformer.layers.9.attention.wo.weight
  169. Parameter (name=transformer.h.9.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  170. name: transformer.h.9.ln_2.weight->transformer.layers.9.ffn_norm.weight
  171. Parameter (name=transformer.h.9.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  172. name: transformer.h.9.mlp.w1.weight->transformer.layers.9.feed_forward.w1.weight
  173. Parameter (name=transformer.h.9.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  174. name: transformer.h.9.mlp.w2.weight->transformer.layers.9.feed_forward.w3.weight
  175. Parameter (name=transformer.h.9.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  176. name: transformer.h.9.mlp.c_proj.weight->transformer.layers.9.feed_forward.w2.weight
  177. Parameter (name=transformer.h.10.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  178. name: transformer.h.10.ln_1.weight->transformer.layers.10.attention_norm.weight
  179. Parameter (name=transformer.h.10.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  180. name: transformer.h.10.attn.c_attn.weight->transformer.layers.10.attn.c_attn.weight
  181. Parameter (name=transformer.h.10.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  182. name: transformer.h.10.attn.c_attn.bias->transformer.layers.10.attn.c_attn.bias
  183. Parameter (name=transformer.h.10.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  184. name: transformer.h.10.attn.c_proj.weight->transformer.layers.10.attention.wo.weight
  185. Parameter (name=transformer.h.10.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  186. name: transformer.h.10.ln_2.weight->transformer.layers.10.ffn_norm.weight
  187. Parameter (name=transformer.h.10.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  188. name: transformer.h.10.mlp.w1.weight->transformer.layers.10.feed_forward.w1.weight
  189. Parameter (name=transformer.h.10.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  190. name: transformer.h.10.mlp.w2.weight->transformer.layers.10.feed_forward.w3.weight
  191. Parameter (name=transformer.h.10.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  192. name: transformer.h.10.mlp.c_proj.weight->transformer.layers.10.feed_forward.w2.weight
  193. Parameter (name=transformer.h.11.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  194. name: transformer.h.11.ln_1.weight->transformer.layers.11.attention_norm.weight
  195. Parameter (name=transformer.h.11.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  196. name: transformer.h.11.attn.c_attn.weight->transformer.layers.11.attn.c_attn.weight
  197. Parameter (name=transformer.h.11.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  198. name: transformer.h.11.attn.c_attn.bias->transformer.layers.11.attn.c_attn.bias
  199. Parameter (name=transformer.h.11.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  200. name: transformer.h.11.attn.c_proj.weight->transformer.layers.11.attention.wo.weight
  201. Parameter (name=transformer.h.11.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  202. name: transformer.h.11.ln_2.weight->transformer.layers.11.ffn_norm.weight
  203. Parameter (name=transformer.h.11.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  204. name: transformer.h.11.mlp.w1.weight->transformer.layers.11.feed_forward.w1.weight
  205. Parameter (name=transformer.h.11.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  206. name: transformer.h.11.mlp.w2.weight->transformer.layers.11.feed_forward.w3.weight
  207. Parameter (name=transformer.h.11.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  208. name: transformer.h.11.mlp.c_proj.weight->transformer.layers.11.feed_forward.w2.weight
  209. Parameter (name=transformer.h.12.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  210. name: transformer.h.12.ln_1.weight->transformer.layers.12.attention_norm.weight
  211. Parameter (name=transformer.h.12.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  212. name: transformer.h.12.attn.c_attn.weight->transformer.layers.12.attn.c_attn.weight
  213. Parameter (name=transformer.h.12.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  214. name: transformer.h.12.attn.c_attn.bias->transformer.layers.12.attn.c_attn.bias
  215. Parameter (name=transformer.h.12.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  216. name: transformer.h.12.attn.c_proj.weight->transformer.layers.12.attention.wo.weight
  217. Parameter (name=transformer.h.12.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  218. name: transformer.h.12.ln_2.weight->transformer.layers.12.ffn_norm.weight
  219. Parameter (name=transformer.h.12.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  220. name: transformer.h.12.mlp.w1.weight->transformer.layers.12.feed_forward.w1.weight
  221. Parameter (name=transformer.h.12.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  222. name: transformer.h.12.mlp.w2.weight->transformer.layers.12.feed_forward.w3.weight
  223. Parameter (name=transformer.h.12.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  224. name: transformer.h.12.mlp.c_proj.weight->transformer.layers.12.feed_forward.w2.weight
  225. Parameter (name=transformer.h.13.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  226. name: transformer.h.13.ln_1.weight->transformer.layers.13.attention_norm.weight
  227. Parameter (name=transformer.h.13.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  228. name: transformer.h.13.attn.c_attn.weight->transformer.layers.13.attn.c_attn.weight
  229. Parameter (name=transformer.h.13.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  230. name: transformer.h.13.attn.c_attn.bias->transformer.layers.13.attn.c_attn.bias
  231. Parameter (name=transformer.h.13.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  232. name: transformer.h.13.attn.c_proj.weight->transformer.layers.13.attention.wo.weight
  233. Parameter (name=transformer.h.13.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  234. name: transformer.h.13.ln_2.weight->transformer.layers.13.ffn_norm.weight
  235. Parameter (name=transformer.h.13.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  236. name: transformer.h.13.mlp.w1.weight->transformer.layers.13.feed_forward.w1.weight
  237. Parameter (name=transformer.h.13.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  238. name: transformer.h.13.mlp.w2.weight->transformer.layers.13.feed_forward.w3.weight
  239. Parameter (name=transformer.h.13.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  240. name: transformer.h.13.mlp.c_proj.weight->transformer.layers.13.feed_forward.w2.weight
  241. Parameter (name=transformer.h.14.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  242. name: transformer.h.14.ln_1.weight->transformer.layers.14.attention_norm.weight
  243. Parameter (name=transformer.h.14.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  244. name: transformer.h.14.attn.c_attn.weight->transformer.layers.14.attn.c_attn.weight
  245. Parameter (name=transformer.h.14.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  246. name: transformer.h.14.attn.c_attn.bias->transformer.layers.14.attn.c_attn.bias
  247. Parameter (name=transformer.h.14.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  248. name: transformer.h.14.attn.c_proj.weight->transformer.layers.14.attention.wo.weight
  249. Parameter (name=transformer.h.14.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  250. name: transformer.h.14.ln_2.weight->transformer.layers.14.ffn_norm.weight
  251. Parameter (name=transformer.h.14.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  252. name: transformer.h.14.mlp.w1.weight->transformer.layers.14.feed_forward.w1.weight
  253. Parameter (name=transformer.h.14.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  254. name: transformer.h.14.mlp.w2.weight->transformer.layers.14.feed_forward.w3.weight
  255. Parameter (name=transformer.h.14.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  256. name: transformer.h.14.mlp.c_proj.weight->transformer.layers.14.feed_forward.w2.weight
  257. Parameter (name=transformer.h.15.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  258. name: transformer.h.15.ln_1.weight->transformer.layers.15.attention_norm.weight
  259. Parameter (name=transformer.h.15.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  260. name: transformer.h.15.attn.c_attn.weight->transformer.layers.15.attn.c_attn.weight
  261. Parameter (name=transformer.h.15.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  262. name: transformer.h.15.attn.c_attn.bias->transformer.layers.15.attn.c_attn.bias
  263. Parameter (name=transformer.h.15.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  264. name: transformer.h.15.attn.c_proj.weight->transformer.layers.15.attention.wo.weight
  265. Parameter (name=transformer.h.15.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  266. name: transformer.h.15.ln_2.weight->transformer.layers.15.ffn_norm.weight
  267. Parameter (name=transformer.h.15.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  268. name: transformer.h.15.mlp.w1.weight->transformer.layers.15.feed_forward.w1.weight
  269. Parameter (name=transformer.h.15.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  270. name: transformer.h.15.mlp.w2.weight->transformer.layers.15.feed_forward.w3.weight
  271. Parameter (name=transformer.h.15.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  272. name: transformer.h.15.mlp.c_proj.weight->transformer.layers.15.feed_forward.w2.weight
  273. Parameter (name=transformer.h.16.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  274. name: transformer.h.16.ln_1.weight->transformer.layers.16.attention_norm.weight
  275. Parameter (name=transformer.h.16.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  276. name: transformer.h.16.attn.c_attn.weight->transformer.layers.16.attn.c_attn.weight
  277. Parameter (name=transformer.h.16.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  278. name: transformer.h.16.attn.c_attn.bias->transformer.layers.16.attn.c_attn.bias
  279. Parameter (name=transformer.h.16.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  280. name: transformer.h.16.attn.c_proj.weight->transformer.layers.16.attention.wo.weight
  281. Parameter (name=transformer.h.16.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  282. name: transformer.h.16.ln_2.weight->transformer.layers.16.ffn_norm.weight
  283. Parameter (name=transformer.h.16.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  284. name: transformer.h.16.mlp.w1.weight->transformer.layers.16.feed_forward.w1.weight
  285. Parameter (name=transformer.h.16.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  286. name: transformer.h.16.mlp.w2.weight->transformer.layers.16.feed_forward.w3.weight
  287. Parameter (name=transformer.h.16.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  288. name: transformer.h.16.mlp.c_proj.weight->transformer.layers.16.feed_forward.w2.weight
  289. Parameter (name=transformer.h.17.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  290. name: transformer.h.17.ln_1.weight->transformer.layers.17.attention_norm.weight
  291. Parameter (name=transformer.h.17.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  292. name: transformer.h.17.attn.c_attn.weight->transformer.layers.17.attn.c_attn.weight
  293. Parameter (name=transformer.h.17.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  294. name: transformer.h.17.attn.c_attn.bias->transformer.layers.17.attn.c_attn.bias
  295. Parameter (name=transformer.h.17.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  296. name: transformer.h.17.attn.c_proj.weight->transformer.layers.17.attention.wo.weight
  297. Parameter (name=transformer.h.17.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  298. name: transformer.h.17.ln_2.weight->transformer.layers.17.ffn_norm.weight
  299. Parameter (name=transformer.h.17.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  300. name: transformer.h.17.mlp.w1.weight->transformer.layers.17.feed_forward.w1.weight
  301. Parameter (name=transformer.h.17.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  302. name: transformer.h.17.mlp.w2.weight->transformer.layers.17.feed_forward.w3.weight
  303. Parameter (name=transformer.h.17.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  304. name: transformer.h.17.mlp.c_proj.weight->transformer.layers.17.feed_forward.w2.weight
  305. Parameter (name=transformer.h.18.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  306. name: transformer.h.18.ln_1.weight->transformer.layers.18.attention_norm.weight
  307. Parameter (name=transformer.h.18.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  308. name: transformer.h.18.attn.c_attn.weight->transformer.layers.18.attn.c_attn.weight
  309. Parameter (name=transformer.h.18.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  310. name: transformer.h.18.attn.c_attn.bias->transformer.layers.18.attn.c_attn.bias
  311. Parameter (name=transformer.h.18.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  312. name: transformer.h.18.attn.c_proj.weight->transformer.layers.18.attention.wo.weight
  313. Parameter (name=transformer.h.18.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  314. name: transformer.h.18.ln_2.weight->transformer.layers.18.ffn_norm.weight
  315. Parameter (name=transformer.h.18.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  316. name: transformer.h.18.mlp.w1.weight->transformer.layers.18.feed_forward.w1.weight
  317. Parameter (name=transformer.h.18.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  318. name: transformer.h.18.mlp.w2.weight->transformer.layers.18.feed_forward.w3.weight
  319. Parameter (name=transformer.h.18.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  320. name: transformer.h.18.mlp.c_proj.weight->transformer.layers.18.feed_forward.w2.weight
  321. Parameter (name=transformer.h.19.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  322. name: transformer.h.19.ln_1.weight->transformer.layers.19.attention_norm.weight
  323. Parameter (name=transformer.h.19.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  324. name: transformer.h.19.attn.c_attn.weight->transformer.layers.19.attn.c_attn.weight
  325. Parameter (name=transformer.h.19.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  326. name: transformer.h.19.attn.c_attn.bias->transformer.layers.19.attn.c_attn.bias
  327. Parameter (name=transformer.h.19.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  328. name: transformer.h.19.attn.c_proj.weight->transformer.layers.19.attention.wo.weight
  329. Parameter (name=transformer.h.19.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  330. name: transformer.h.19.ln_2.weight->transformer.layers.19.ffn_norm.weight
  331. Parameter (name=transformer.h.19.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  332. name: transformer.h.19.mlp.w1.weight->transformer.layers.19.feed_forward.w1.weight
  333. Parameter (name=transformer.h.19.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  334. name: transformer.h.19.mlp.w2.weight->transformer.layers.19.feed_forward.w3.weight
  335. Parameter (name=transformer.h.19.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  336. name: transformer.h.19.mlp.c_proj.weight->transformer.layers.19.feed_forward.w2.weight
  337. Parameter (name=transformer.h.20.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  338. name: transformer.h.20.ln_1.weight->transformer.layers.20.attention_norm.weight
  339. Parameter (name=transformer.h.20.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  340. name: transformer.h.20.attn.c_attn.weight->transformer.layers.20.attn.c_attn.weight
  341. Parameter (name=transformer.h.20.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  342. name: transformer.h.20.attn.c_attn.bias->transformer.layers.20.attn.c_attn.bias
  343. Parameter (name=transformer.h.20.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  344. name: transformer.h.20.attn.c_proj.weight->transformer.layers.20.attention.wo.weight
  345. Parameter (name=transformer.h.20.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  346. name: transformer.h.20.ln_2.weight->transformer.layers.20.ffn_norm.weight
  347. Parameter (name=transformer.h.20.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  348. name: transformer.h.20.mlp.w1.weight->transformer.layers.20.feed_forward.w1.weight
  349. Parameter (name=transformer.h.20.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  350. name: transformer.h.20.mlp.w2.weight->transformer.layers.20.feed_forward.w3.weight
  351. Parameter (name=transformer.h.20.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  352. name: transformer.h.20.mlp.c_proj.weight->transformer.layers.20.feed_forward.w2.weight
  353. Parameter (name=transformer.h.21.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  354. name: transformer.h.21.ln_1.weight->transformer.layers.21.attention_norm.weight
  355. Parameter (name=transformer.h.21.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  356. name: transformer.h.21.attn.c_attn.weight->transformer.layers.21.attn.c_attn.weight
  357. Parameter (name=transformer.h.21.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  358. name: transformer.h.21.attn.c_attn.bias->transformer.layers.21.attn.c_attn.bias
  359. Parameter (name=transformer.h.21.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  360. name: transformer.h.21.attn.c_proj.weight->transformer.layers.21.attention.wo.weight
  361. Parameter (name=transformer.h.21.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  362. name: transformer.h.21.ln_2.weight->transformer.layers.21.ffn_norm.weight
  363. Parameter (name=transformer.h.21.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  364. name: transformer.h.21.mlp.w1.weight->transformer.layers.21.feed_forward.w1.weight
  365. Parameter (name=transformer.h.21.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  366. name: transformer.h.21.mlp.w2.weight->transformer.layers.21.feed_forward.w3.weight
  367. Parameter (name=transformer.h.21.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  368. name: transformer.h.21.mlp.c_proj.weight->transformer.layers.21.feed_forward.w2.weight
  369. Parameter (name=transformer.h.22.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  370. name: transformer.h.22.ln_1.weight->transformer.layers.22.attention_norm.weight
  371. Parameter (name=transformer.h.22.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  372. name: transformer.h.22.attn.c_attn.weight->transformer.layers.22.attn.c_attn.weight
  373. Parameter (name=transformer.h.22.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  374. name: transformer.h.22.attn.c_attn.bias->transformer.layers.22.attn.c_attn.bias
  375. Parameter (name=transformer.h.22.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  376. name: transformer.h.22.attn.c_proj.weight->transformer.layers.22.attention.wo.weight
  377. Parameter (name=transformer.h.22.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  378. name: transformer.h.22.ln_2.weight->transformer.layers.22.ffn_norm.weight
  379. Parameter (name=transformer.h.22.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  380. name: transformer.h.22.mlp.w1.weight->transformer.layers.22.feed_forward.w1.weight
  381. Parameter (name=transformer.h.22.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  382. name: transformer.h.22.mlp.w2.weight->transformer.layers.22.feed_forward.w3.weight
  383. Parameter (name=transformer.h.22.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  384. name: transformer.h.22.mlp.c_proj.weight->transformer.layers.22.feed_forward.w2.weight
  385. Parameter (name=transformer.h.23.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  386. name: transformer.h.23.ln_1.weight->transformer.layers.23.attention_norm.weight
  387. Parameter (name=transformer.h.23.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  388. name: transformer.h.23.attn.c_attn.weight->transformer.layers.23.attn.c_attn.weight
  389. Parameter (name=transformer.h.23.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  390. name: transformer.h.23.attn.c_attn.bias->transformer.layers.23.attn.c_attn.bias
  391. Parameter (name=transformer.h.23.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  392. name: transformer.h.23.attn.c_proj.weight->transformer.layers.23.attention.wo.weight
  393. Parameter (name=transformer.h.23.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  394. name: transformer.h.23.ln_2.weight->transformer.layers.23.ffn_norm.weight
  395. Parameter (name=transformer.h.23.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  396. name: transformer.h.23.mlp.w1.weight->transformer.layers.23.feed_forward.w1.weight
  397. Parameter (name=transformer.h.23.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  398. name: transformer.h.23.mlp.w2.weight->transformer.layers.23.feed_forward.w3.weight
  399. Parameter (name=transformer.h.23.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  400. name: transformer.h.23.mlp.c_proj.weight->transformer.layers.23.feed_forward.w2.weight
  401. Parameter (name=transformer.h.24.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  402. name: transformer.h.24.ln_1.weight->transformer.layers.24.attention_norm.weight
  403. Parameter (name=transformer.h.24.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  404. name: transformer.h.24.attn.c_attn.weight->transformer.layers.24.attn.c_attn.weight
  405. Parameter (name=transformer.h.24.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  406. name: transformer.h.24.attn.c_attn.bias->transformer.layers.24.attn.c_attn.bias
  407. Parameter (name=transformer.h.24.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  408. name: transformer.h.24.attn.c_proj.weight->transformer.layers.24.attention.wo.weight
  409. Parameter (name=transformer.h.24.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  410. name: transformer.h.24.ln_2.weight->transformer.layers.24.ffn_norm.weight
  411. Parameter (name=transformer.h.24.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  412. name: transformer.h.24.mlp.w1.weight->transformer.layers.24.feed_forward.w1.weight
  413. Parameter (name=transformer.h.24.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  414. name: transformer.h.24.mlp.w2.weight->transformer.layers.24.feed_forward.w3.weight
  415. Parameter (name=transformer.h.24.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  416. name: transformer.h.24.mlp.c_proj.weight->transformer.layers.24.feed_forward.w2.weight
  417. Parameter (name=transformer.h.25.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  418. name: transformer.h.25.ln_1.weight->transformer.layers.25.attention_norm.weight
  419. Parameter (name=transformer.h.25.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  420. name: transformer.h.25.attn.c_attn.weight->transformer.layers.25.attn.c_attn.weight
  421. Parameter (name=transformer.h.25.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  422. name: transformer.h.25.attn.c_attn.bias->transformer.layers.25.attn.c_attn.bias
  423. Parameter (name=transformer.h.25.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  424. name: transformer.h.25.attn.c_proj.weight->transformer.layers.25.attention.wo.weight
  425. Parameter (name=transformer.h.25.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  426. name: transformer.h.25.ln_2.weight->transformer.layers.25.ffn_norm.weight
  427. Parameter (name=transformer.h.25.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  428. name: transformer.h.25.mlp.w1.weight->transformer.layers.25.feed_forward.w1.weight
  429. Parameter (name=transformer.h.25.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  430. name: transformer.h.25.mlp.w2.weight->transformer.layers.25.feed_forward.w3.weight
  431. Parameter (name=transformer.h.25.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  432. name: transformer.h.25.mlp.c_proj.weight->transformer.layers.25.feed_forward.w2.weight
  433. Parameter (name=transformer.h.26.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  434. name: transformer.h.26.ln_1.weight->transformer.layers.26.attention_norm.weight
  435. Parameter (name=transformer.h.26.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  436. name: transformer.h.26.attn.c_attn.weight->transformer.layers.26.attn.c_attn.weight
  437. Parameter (name=transformer.h.26.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  438. name: transformer.h.26.attn.c_attn.bias->transformer.layers.26.attn.c_attn.bias
  439. Parameter (name=transformer.h.26.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  440. name: transformer.h.26.attn.c_proj.weight->transformer.layers.26.attention.wo.weight
  441. Parameter (name=transformer.h.26.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  442. name: transformer.h.26.ln_2.weight->transformer.layers.26.ffn_norm.weight
  443. Parameter (name=transformer.h.26.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  444. name: transformer.h.26.mlp.w1.weight->transformer.layers.26.feed_forward.w1.weight
  445. Parameter (name=transformer.h.26.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  446. name: transformer.h.26.mlp.w2.weight->transformer.layers.26.feed_forward.w3.weight
  447. Parameter (name=transformer.h.26.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  448. name: transformer.h.26.mlp.c_proj.weight->transformer.layers.26.feed_forward.w2.weight
  449. Parameter (name=transformer.h.27.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  450. name: transformer.h.27.ln_1.weight->transformer.layers.27.attention_norm.weight
  451. Parameter (name=transformer.h.27.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  452. name: transformer.h.27.attn.c_attn.weight->transformer.layers.27.attn.c_attn.weight
  453. Parameter (name=transformer.h.27.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  454. name: transformer.h.27.attn.c_attn.bias->transformer.layers.27.attn.c_attn.bias
  455. Parameter (name=transformer.h.27.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  456. name: transformer.h.27.attn.c_proj.weight->transformer.layers.27.attention.wo.weight
  457. Parameter (name=transformer.h.27.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  458. name: transformer.h.27.ln_2.weight->transformer.layers.27.ffn_norm.weight
  459. Parameter (name=transformer.h.27.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  460. name: transformer.h.27.mlp.w1.weight->transformer.layers.27.feed_forward.w1.weight
  461. Parameter (name=transformer.h.27.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  462. name: transformer.h.27.mlp.w2.weight->transformer.layers.27.feed_forward.w3.weight
  463. Parameter (name=transformer.h.27.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  464. name: transformer.h.27.mlp.c_proj.weight->transformer.layers.27.feed_forward.w2.weight
  465. Parameter (name=transformer.h.28.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  466. name: transformer.h.28.ln_1.weight->transformer.layers.28.attention_norm.weight
  467. Parameter (name=transformer.h.28.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  468. name: transformer.h.28.attn.c_attn.weight->transformer.layers.28.attn.c_attn.weight
  469. Parameter (name=transformer.h.28.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  470. name: transformer.h.28.attn.c_attn.bias->transformer.layers.28.attn.c_attn.bias
  471. Parameter (name=transformer.h.28.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  472. name: transformer.h.28.attn.c_proj.weight->transformer.layers.28.attention.wo.weight
  473. Parameter (name=transformer.h.28.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  474. name: transformer.h.28.ln_2.weight->transformer.layers.28.ffn_norm.weight
  475. Parameter (name=transformer.h.28.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  476. name: transformer.h.28.mlp.w1.weight->transformer.layers.28.feed_forward.w1.weight
  477. Parameter (name=transformer.h.28.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  478. name: transformer.h.28.mlp.w2.weight->transformer.layers.28.feed_forward.w3.weight
  479. Parameter (name=transformer.h.28.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  480. name: transformer.h.28.mlp.c_proj.weight->transformer.layers.28.feed_forward.w2.weight
  481. Parameter (name=transformer.h.29.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  482. name: transformer.h.29.ln_1.weight->transformer.layers.29.attention_norm.weight
  483. Parameter (name=transformer.h.29.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  484. name: transformer.h.29.attn.c_attn.weight->transformer.layers.29.attn.c_attn.weight
  485. Parameter (name=transformer.h.29.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  486. name: transformer.h.29.attn.c_attn.bias->transformer.layers.29.attn.c_attn.bias
  487. Parameter (name=transformer.h.29.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  488. name: transformer.h.29.attn.c_proj.weight->transformer.layers.29.attention.wo.weight
  489. Parameter (name=transformer.h.29.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  490. name: transformer.h.29.ln_2.weight->transformer.layers.29.ffn_norm.weight
  491. Parameter (name=transformer.h.29.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  492. name: transformer.h.29.mlp.w1.weight->transformer.layers.29.feed_forward.w1.weight
  493. Parameter (name=transformer.h.29.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  494. name: transformer.h.29.mlp.w2.weight->transformer.layers.29.feed_forward.w3.weight
  495. Parameter (name=transformer.h.29.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  496. name: transformer.h.29.mlp.c_proj.weight->transformer.layers.29.feed_forward.w2.weight
  497. Parameter (name=transformer.h.30.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  498. name: transformer.h.30.ln_1.weight->transformer.layers.30.attention_norm.weight
  499. Parameter (name=transformer.h.30.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  500. name: transformer.h.30.attn.c_attn.weight->transformer.layers.30.attn.c_attn.weight
  501. Parameter (name=transformer.h.30.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  502. name: transformer.h.30.attn.c_attn.bias->transformer.layers.30.attn.c_attn.bias
  503. Parameter (name=transformer.h.30.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  504. name: transformer.h.30.attn.c_proj.weight->transformer.layers.30.attention.wo.weight
  505. Parameter (name=transformer.h.30.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  506. name: transformer.h.30.ln_2.weight->transformer.layers.30.ffn_norm.weight
  507. Parameter (name=transformer.h.30.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  508. name: transformer.h.30.mlp.w1.weight->transformer.layers.30.feed_forward.w1.weight
  509. Parameter (name=transformer.h.30.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  510. name: transformer.h.30.mlp.w2.weight->transformer.layers.30.feed_forward.w3.weight
  511. Parameter (name=transformer.h.30.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  512. name: transformer.h.30.mlp.c_proj.weight->transformer.layers.30.feed_forward.w2.weight
  513. Parameter (name=transformer.h.31.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  514. name: transformer.h.31.ln_1.weight->transformer.layers.31.attention_norm.weight
  515. Parameter (name=transformer.h.31.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
  516. name: transformer.h.31.attn.c_attn.weight->transformer.layers.31.attn.c_attn.weight
  517. Parameter (name=transformer.h.31.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
  518. name: transformer.h.31.attn.c_attn.bias->transformer.layers.31.attn.c_attn.bias
  519. Parameter (name=transformer.h.31.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
  520. name: transformer.h.31.attn.c_proj.weight->transformer.layers.31.attention.wo.weight
  521. Parameter (name=transformer.h.31.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  522. name: transformer.h.31.ln_2.weight->transformer.layers.31.ffn_norm.weight
  523. Parameter (name=transformer.h.31.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  524. name: transformer.h.31.mlp.w1.weight->transformer.layers.31.feed_forward.w1.weight
  525. Parameter (name=transformer.h.31.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
  526. name: transformer.h.31.mlp.w2.weight->transformer.layers.31.feed_forward.w3.weight
  527. Parameter (name=transformer.h.31.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
  528. name: transformer.h.31.mlp.c_proj.weight->transformer.layers.31.feed_forward.w2.weight
  529. Parameter (name=transformer.ln_f.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
  530. Parameter (name=lm_head.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
  531. Saving converted weights to /data/qwen/models/Qwen-7B-Chat/qwen-7b-chat.ckpt...
  532. Done

配置路径,启动推理脚本。

cd /data/qwen/mindformers/research/qwen

export PYTHONPATH=/data/qwen/mindformers:$PYTHONPATH

python3 infer_qwen.py

  1. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  2. setattr(self, word, getattr(machar, word).flat[0])
  3. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  4. return self._float_to_str(self.smallest_subnormal)
  5. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  6. setattr(self, word, getattr(machar, word).flat[0])
  7. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  8. return self._float_to_str(self.smallest_subnormal)
  9. [Warning]Can not find libascendalog.so
  10. [Warning]Can not find libascendalog.so
  11. Traceback (most recent call last):
  12. File "/data/qwen/mindformers/research/qwen/infer_qwen.py", line 4, in <module>
  13. from mindformers.trainer import Trainer
  14. File "/data/qwen/mindformers/mindformers/__init__.py", line 17, in <module>
  15. from mindformers import core, auto_class, dataset, \
  16. File "/data/qwen/mindformers/mindformers/core/__init__.py", line 19, in <module>
  17. from .metric import build_metric
  18. File "/data/qwen/mindformers/mindformers/core/metric/__init__.py", line 17, in <module>
  19. from .metric import *
  20. File "/data/qwen/mindformers/mindformers/core/metric/metric.py", line 37, in <module>
  21. from mindformers.models import BasicTokenizer
  22. File "/data/qwen/mindformers/mindformers/models/__init__.py", line 21, in <module>
  23. from .blip2 import *
  24. File "/data/qwen/mindformers/mindformers/models/blip2/__init__.py", line 17, in <module>
  25. from .blip2_config import Blip2Config
  26. File "/data/qwen/mindformers/mindformers/models/blip2/blip2_config.py", line 23, in <module>
  27. from mindformers.models.llama import LlamaConfig
  28. File "/data/qwen/mindformers/mindformers/models/llama/__init__.py", line 18, in <module>
  29. from .llama import LlamaForCausalLM, LlamaForCausalLMWithLora, LlamaModel
  30. File "/data/qwen/mindformers/mindformers/models/llama/llama.py", line 30, in <module>
  31. from mindspore.nn.layer.flash_attention import FlashAttention
  32. File "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/nn/layer/flash_attention.py", line 24, in <module>
  33. from mindspore.ops._op_impl._custom_op.flash_attention.flash_attention_impl import get_flash_attention
  34. File "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/__init__.py", line 17, in <module>
  35. from mindspore.ops._op_impl._custom_op.dsd_impl import dsd_matmul
  36. File "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/dsd_impl.py", line 17, in <module>
  37. from te import tik
  38. File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/te/__init__.py", line 128, in <module>
  39. from tbe import tvm
  40. File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/__init__.py", line 44, in <module>
  41. import tvm
  42. File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/__init__.py", line 26, in <module>
  43. from ._ffi.base import TVMError, __version__, _RUNTIME_ONLY
  44. File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/__init__.py", line 28, in <module>
  45. from .base import register_error
  46. File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 72, in <module>
  47. _LIB, _LIB_NAME = _load_lib()
  48. File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 52, in _load_lib
  49. lib_path = libinfo.find_lib_path()
  50. File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/libinfo.py", line 147, in find_lib_path
  51. raise RuntimeError(message)
  52. RuntimeError: Cannot find the files.
  53. List of candidates:
  54. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm.so
  55. /usr/local/Ascend/driver/libtvm.so
  56. /data/qwen/mindformers/research/qwen/libtvm.so
  57. /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm.so
  58. /usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm.so
  59. /root/miniconda3/envs/mindspore2.2_py39/bin/libtvm.so
  60. /root/miniconda3/condabin/libtvm.so
  61. /usr/local/sbin/libtvm.so
  62. /usr/local/bin/libtvm.so
  63. /usr/sbin/libtvm.so
  64. /usr/bin/libtvm.so
  65. /usr/sbin/libtvm.so
  66. /usr/bin/libtvm.so
  67. /usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm.so
  68. /usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm.so
  69. /root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm_runtime.so
  70. /usr/local/Ascend/driver/libtvm_runtime.so
  71. /data/qwen/mindformers/research/qwen/libtvm_runtime.so
  72. /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm_runtime.so
  73. /usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm_runtime.so
  74. /root/miniconda3/envs/mindspore2.2_py39/bin/libtvm_runtime.so
  75. /root/miniconda3/condabin/libtvm_runtime.so
  76. /usr/local/sbin/libtvm_runtime.so
  77. /usr/local/bin/libtvm_runtime.so
  78. /usr/sbin/libtvm_runtime.so
  79. /usr/bin/libtvm_runtime.so
  80. /usr/sbin/libtvm_runtime.so
  81. /usr/bin/libtvm_runtime.so
  82. /usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm_runtime.so
  83. /usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm_runtime.so

报错信息,应该是和配置芯片架构中缺少的文件,当前不做深入探究。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/290739
推荐阅读
相关标签
  

闽ICP备14008679号