当前位置:   article > 正文

昇思25天学习打卡营第17天|LLM-基于MindSpore的GPT2文本摘要

昇思25天学习打卡营第17天|LLM-基于MindSpore的GPT2文本摘要

打卡

目录

打卡

环境准备

准备阶段

数据加载与预处理

BertTokenizer

部分输出

模型构建

gpt2模型结构输出

训练流程

部分输出

部分输出2(减少训练数据)

推理流程


环境准备

  1. pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
  2. pip install tokenizers==0.15.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
  3. # 该案例在 mindnlp 0.3.1 版本完成适配,如果发现案例跑不通,可以指定mindnlp版本,执行`!pip install mindnlp==0.3.1`
  4. pip install mindnlp

准备阶段

nlpcc2017摘要数据,内容为新闻正文及其摘要,总计50000个样本。

来源:nlpcc2017摘要数据

数据加载与预处理

  • 原始数据格式:
  1. article: [CLS] article_context [SEP]
  2. summary: [CLS] summary_context [SEP]
  • 预处理后的数据格式:
[CLS] article_context [SEP] summary_context [SEP]

BertTokenizer

因GPT2无中文的tokenizer,使用BertTokenizer替代。代码如下:

  1. from mindspore.dataset import TextFileDataset
  2. import json
  3. import numpy as np
  4. from mindnlp.transformers import BertTokenizer
  5. # preprocess dataset
  6. def process_dataset(dataset, tokenizer, batch_size=6, max_seq_len=1024, shuffle=False):
  7. def read_map(text):
  8. data = json.loads(text.tobytes())
  9. return np.array(data['article']), np.array(data['summarization'])
  10. def merge_and_pad(article, summary):
  11. # tokenization
  12. # pad to max_seq_length, only truncate the article
  13. tokenized = tokenizer(text=article, text_pair=summary,
  14. padding='max_length', truncation='only_first', max_length=max_seq_len)
  15. return tokenized['input_ids'], tokenized['input_ids']
  16. dataset = dataset.map(read_map, 'text', ['article', 'summary'])
  17. # change column names to input_ids and labels for the following training
  18. dataset = dataset.map(merge_and_pad, ['article', 'summary'], ['input_ids', 'labels'])
  19. dataset = dataset.batch(batch_size)
  20. if shuffle:
  21. dataset = dataset.shuffle(batch_size)
  22. return dataset
  23. # load dataset
  24. dataset = TextFileDataset(str(path), shuffle=False)
  25. print(dataset.get_dataset_size()) ### 50000
  26. # split into training and testing dataset
  27. train_dataset, test_dataset = dataset.split([0.9, 0.1], randomize=False)
  28. print(len(train_dataset)) ### 45000
  29. # We use BertTokenizer for tokenizing chinese context.
  30. tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
  31. len(tokenizer)
  32. train_dataset = process_dataset(train_dataset, tokenizer, batch_size=4)
  33. ## next(train_dataset.create_tuple_iterator())

部分输出

模型构建

如下,通过两个类实现:

  1. 构建GPT2ForSummarization模型,注意shift right的操作。
  2. 动态学习率
  1. from mindspore import ops
  2. from mindnlp.transformers import GPT2LMHeadModel
  3. from mindspore.nn.learning_rate_schedule import LearningRateSchedule
  4. from mindspore import nn
  5. from mindnlp.transformers import GPT2Config, GPT2LMHeadModel
  6. from mindnlp._legacy.engine import Trainer
  7. from mindnlp._legacy.engine.callbacks import CheckpointCallback
  8. class GPT2ForSummarization(GPT2LMHeadModel):
  9. def construct(
  10. self,
  11. input_ids = None,
  12. attention_mask = None,
  13. labels = None,
  14. ):
  15. outputs = super().construct(input_ids=input_ids, attention_mask=attention_mask)
  16. shift_logits = outputs.logits[..., :-1, :]
  17. shift_labels = labels[..., 1:]
  18. # Flatten the tokens
  19. loss = ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1), ignore_index=tokenizer.pad_token_id)
  20. return loss
  21. class LinearWithWarmUp(LearningRateSchedule):
  22. """
  23. Warmup-decay learning rate.
  24. """
  25. def __init__(self, learning_rate, num_warmup_steps, num_training_steps):
  26. super().__init__()
  27. self.learning_rate = learning_rate
  28. self.num_warmup_steps = num_warmup_steps
  29. self.num_training_steps = num_training_steps
  30. def construct(self, global_step):
  31. if global_step < self.num_warmup_steps:
  32. return global_step / float(max(1, self.num_warmup_steps)) * self.learning_rate
  33. return ops.maximum(
  34. 0.0, (self.num_training_steps - global_step) / (max(1, self.num_training_steps - self.num_warmup_steps))
  35. ) * self.learning_rate
  36. ## 训练参数设置
  37. num_epochs = 1
  38. warmup_steps = 2000
  39. learning_rate = 1.5e-4
  40. num_training_steps = num_epochs * train_dataset.get_dataset_size()
  41. config = GPT2Config(vocab_size=len(tokenizer))
  42. model = GPT2ForSummarization(config)
  43. lr_scheduler = LinearWithWarmUp(
  44. learning_rate=learning_rate,
  45. num_warmup_steps=warmup_steps,
  46. num_training_steps=num_training_steps)
  47. optimizer = nn.AdamWeightDecay(model.trainable_params(),
  48. learning_rate=lr_scheduler)
  49. # 记录模型参数数量
  50. print('number of model parameters: {}'.format(model.num_parameters()))

gpt2模型结构输出

1. 1级主类:GPT2ForSummarization

2. 2级类:GPT2Model 层,是transformer 结构,是模型的核心部分。

3. 2级类:lm_head 结构的 Dense 全连接层 , dim[in, out]=[768,  21128]。

4. GPT2Model 结构下的3级类组件分三层:

        >> wte 嵌入层:dim[in, out]=[21128, 768] ,即使用了 21128 个词汇,每个词汇映射到一个768 维的向量。

        >> wpe 嵌入层:dim[in, out]=[1024, 768] 

        >> drop 层。

        >> layers h 隐网络结构层:Transformer模型的主体,包含 12 个 GPT2Block。  

        >> ln_f LayerNorm 最后的层归一化。        

5. GPT2Block 的结构:

        》》ln_1 LayerNorm层,层归一化,用于在注意力机制之前对输入进行归一化。

        》》attn GPT2Attention层,自注意力机制,用于计算输入序列中不同位置的注意力权重。共包括3层:Conv1D、Conv1D、CustomDropout、CustomDropout。

        》》ln_2 LayerNorm层,用于自注意力之后的归一化。

        》》mlp  GPT2MLP层,多层感知机,用于对自注意力层的输出进行进一步的非线性变换。这里使用的操作包括:Conv1D、Conv1D、GELU、CustomDropout。
 

  1. $ print(model)
  2. GPT2ForSummarization<
  3. (transformer): GPT2Model<
  4. (wte): Embedding<vocab_size=21128, embedding_size=768, use_one_hot=False, weight=Parameter (Tensor(shape=[21128, 768], dtype=Float32, value=[...], name=transformer.wte.weight), requires_grad=True), dtype=Float32, padding_idx=None>
  5. (wpe): Embedding<vocab_size=1024, embedding_size=768, use_one_hot=False, weight=Parameter (Tensor(shape=[1024, 768], dtype=Float32, value=[...], name=transformer.wpe.weight), requires_grad=True), dtype=Float32, padding_idx=None>
  6. (drop): CustomDropout<>
  7. (h): CellList<
  8. (0): GPT2Block<
  9. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.0.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.0.ln_1.bias), requires_grad=True)>
  10. (attn): GPT2Attention<
  11. (c_attn): Conv1D<
  12. (matmul): Matmul<>
  13. >
  14. (c_proj): Conv1D<
  15. (matmul): Matmul<>
  16. >
  17. (attn_dropout): CustomDropout<>
  18. (resid_dropout): CustomDropout<>
  19. >
  20. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.0.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.0.ln_2.bias), requires_grad=True)>
  21. (mlp): GPT2MLP<
  22. (c_fc): Conv1D<
  23. (matmul): Matmul<>
  24. >
  25. (c_proj): Conv1D<
  26. (matmul): Matmul<>
  27. >
  28. (act): GELU<>
  29. (dropout): CustomDropout<>
  30. >
  31. >
  32. (1): GPT2Block<
  33. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.1.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.1.ln_1.bias), requires_grad=True)>
  34. (attn): GPT2Attention<
  35. (c_attn): Conv1D<
  36. (matmul): Matmul<>
  37. >
  38. (c_proj): Conv1D<
  39. (matmul): Matmul<>
  40. >
  41. (attn_dropout): CustomDropout<>
  42. (resid_dropout): CustomDropout<>
  43. >
  44. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.1.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.1.ln_2.bias), requires_grad=True)>
  45. (mlp): GPT2MLP<
  46. (c_fc): Conv1D<
  47. (matmul): Matmul<>
  48. >
  49. (c_proj): Conv1D<
  50. (matmul): Matmul<>
  51. >
  52. (act): GELU<>
  53. (dropout): CustomDropout<>
  54. >
  55. >
  56. (2): GPT2Block<
  57. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.2.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.2.ln_1.bias), requires_grad=True)>
  58. (attn): GPT2Attention<
  59. (c_attn): Conv1D<
  60. (matmul): Matmul<>
  61. >
  62. (c_proj): Conv1D<
  63. (matmul): Matmul<>
  64. >
  65. (attn_dropout): CustomDropout<>
  66. (resid_dropout): CustomDropout<>
  67. >
  68. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.2.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.2.ln_2.bias), requires_grad=True)>
  69. (mlp): GPT2MLP<
  70. (c_fc): Conv1D<
  71. (matmul): Matmul<>
  72. >
  73. (c_proj): Conv1D<
  74. (matmul): Matmul<>
  75. >
  76. (act): GELU<>
  77. (dropout): CustomDropout<>
  78. >
  79. >
  80. (3): GPT2Block<
  81. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.3.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.3.ln_1.bias), requires_grad=True)>
  82. (attn): GPT2Attention<
  83. (c_attn): Conv1D<
  84. (matmul): Matmul<>
  85. >
  86. (c_proj): Conv1D<
  87. (matmul): Matmul<>
  88. >
  89. (attn_dropout): CustomDropout<>
  90. (resid_dropout): CustomDropout<>
  91. >
  92. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.3.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.3.ln_2.bias), requires_grad=True)>
  93. (mlp): GPT2MLP<
  94. (c_fc): Conv1D<
  95. (matmul): Matmul<>
  96. >
  97. (c_proj): Conv1D<
  98. (matmul): Matmul<>
  99. >
  100. (act): GELU<>
  101. (dropout): CustomDropout<>
  102. >
  103. >
  104. (4): GPT2Block<
  105. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.4.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.4.ln_1.bias), requires_grad=True)>
  106. (attn): GPT2Attention<
  107. (c_attn): Conv1D<
  108. (matmul): Matmul<>
  109. >
  110. (c_proj): Conv1D<
  111. (matmul): Matmul<>
  112. >
  113. (attn_dropout): CustomDropout<>
  114. (resid_dropout): CustomDropout<>
  115. >
  116. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.4.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.4.ln_2.bias), requires_grad=True)>
  117. (mlp): GPT2MLP<
  118. (c_fc): Conv1D<
  119. (matmul): Matmul<>
  120. >
  121. (c_proj): Conv1D<
  122. (matmul): Matmul<>
  123. >
  124. (act): GELU<>
  125. (dropout): CustomDropout<>
  126. >
  127. >
  128. (5): GPT2Block<
  129. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.5.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.5.ln_1.bias), requires_grad=True)>
  130. (attn): GPT2Attention<
  131. (c_attn): Conv1D<
  132. (matmul): Matmul<>
  133. >
  134. (c_proj): Conv1D<
  135. (matmul): Matmul<>
  136. >
  137. (attn_dropout): CustomDropout<>
  138. (resid_dropout): CustomDropout<>
  139. >
  140. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.5.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.5.ln_2.bias), requires_grad=True)>
  141. (mlp): GPT2MLP<
  142. (c_fc): Conv1D<
  143. (matmul): Matmul<>
  144. >
  145. (c_proj): Conv1D<
  146. (matmul): Matmul<>
  147. >
  148. (act): GELU<>
  149. (dropout): CustomDropout<>
  150. >
  151. >
  152. (6): GPT2Block<
  153. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.6.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.6.ln_1.bias), requires_grad=True)>
  154. (attn): GPT2Attention<
  155. (c_attn): Conv1D<
  156. (matmul): Matmul<>
  157. >
  158. (c_proj): Conv1D<
  159. (matmul): Matmul<>
  160. >
  161. (attn_dropout): CustomDropout<>
  162. (resid_dropout): CustomDropout<>
  163. >
  164. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.6.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.6.ln_2.bias), requires_grad=True)>
  165. (mlp): GPT2MLP<
  166. (c_fc): Conv1D<
  167. (matmul): Matmul<>
  168. >
  169. (c_proj): Conv1D<
  170. (matmul): Matmul<>
  171. >
  172. (act): GELU<>
  173. (dropout): CustomDropout<>
  174. >
  175. >
  176. (7): GPT2Block<
  177. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.7.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.7.ln_1.bias), requires_grad=True)>
  178. (attn): GPT2Attention<
  179. (c_attn): Conv1D<
  180. (matmul): Matmul<>
  181. >
  182. (c_proj): Conv1D<
  183. (matmul): Matmul<>
  184. >
  185. (attn_dropout): CustomDropout<>
  186. (resid_dropout): CustomDropout<>
  187. >
  188. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.7.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.7.ln_2.bias), requires_grad=True)>
  189. (mlp): GPT2MLP<
  190. (c_fc): Conv1D<
  191. (matmul): Matmul<>
  192. >
  193. (c_proj): Conv1D<
  194. (matmul): Matmul<>
  195. >
  196. (act): GELU<>
  197. (dropout): CustomDropout<>
  198. >
  199. >
  200. (8): GPT2Block<
  201. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.8.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.8.ln_1.bias), requires_grad=True)>
  202. (attn): GPT2Attention<
  203. (c_attn): Conv1D<
  204. (matmul): Matmul<>
  205. >
  206. (c_proj): Conv1D<
  207. (matmul): Matmul<>
  208. >
  209. (attn_dropout): CustomDropout<>
  210. (resid_dropout): CustomDropout<>
  211. >
  212. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.8.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.8.ln_2.bias), requires_grad=True)>
  213. (mlp): GPT2MLP<
  214. (c_fc): Conv1D<
  215. (matmul): Matmul<>
  216. >
  217. (c_proj): Conv1D<
  218. (matmul): Matmul<>
  219. >
  220. (act): GELU<>
  221. (dropout): CustomDropout<>
  222. >
  223. >
  224. (9): GPT2Block<
  225. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.9.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.9.ln_1.bias), requires_grad=True)>
  226. (attn): GPT2Attention<
  227. (c_attn): Conv1D<
  228. (matmul): Matmul<>
  229. >
  230. (c_proj): Conv1D<
  231. (matmul): Matmul<>
  232. >
  233. (attn_dropout): CustomDropout<>
  234. (resid_dropout): CustomDropout<>
  235. >
  236. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.9.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.9.ln_2.bias), requires_grad=True)>
  237. (mlp): GPT2MLP<
  238. (c_fc): Conv1D<
  239. (matmul): Matmul<>
  240. >
  241. (c_proj): Conv1D<
  242. (matmul): Matmul<>
  243. >
  244. (act): GELU<>
  245. (dropout): CustomDropout<>
  246. >
  247. >
  248. (10): GPT2Block<
  249. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.10.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.10.ln_1.bias), requires_grad=True)>
  250. (attn): GPT2Attention<
  251. (c_attn): Conv1D<
  252. (matmul): Matmul<>
  253. >
  254. (c_proj): Conv1D<
  255. (matmul): Matmul<>
  256. >
  257. (attn_dropout): CustomDropout<>
  258. (resid_dropout): CustomDropout<>
  259. >
  260. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.10.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.10.ln_2.bias), requires_grad=True)>
  261. (mlp): GPT2MLP<
  262. (c_fc): Conv1D<
  263. (matmul): Matmul<>
  264. >
  265. (c_proj): Conv1D<
  266. (matmul): Matmul<>
  267. >
  268. (act): GELU<>
  269. (dropout): CustomDropout<>
  270. >
  271. >
  272. (11): GPT2Block<
  273. (ln_1): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.11.ln_1.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.11.ln_1.bias), requires_grad=True)>
  274. (attn): GPT2Attention<
  275. (c_attn): Conv1D<
  276. (matmul): Matmul<>
  277. >
  278. (c_proj): Conv1D<
  279. (matmul): Matmul<>
  280. >
  281. (attn_dropout): CustomDropout<>
  282. (resid_dropout): CustomDropout<>
  283. >
  284. (ln_2): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.11.ln_2.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.h.11.ln_2.bias), requires_grad=True)>
  285. (mlp): GPT2MLP<
  286. (c_fc): Conv1D<
  287. (matmul): Matmul<>
  288. >
  289. (c_proj): Conv1D<
  290. (matmul): Matmul<>
  291. >
  292. (act): GELU<>
  293. (dropout): CustomDropout<>
  294. >
  295. >
  296. >
  297. (ln_f): LayerNorm<normalized_shape=[768], begin_norm_axis=-1, begin_params_axis=-1, weight=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.ln_f.weight), requires_grad=True), bias=Parameter (Tensor(shape=[768], dtype=Float32, value=[...], name=transformer.ln_f.bias), requires_grad=True)>
  298. >
  299. (lm_head): Dense<input_channels=768, output_channels=21128>
  300. >

训练流程

  1. from mindspore import nn
  2. from mindnlp.transformers import GPT2Config, GPT2LMHeadModel
  3. from mindnlp._legacy.engine import Trainer
  4. from mindnlp._legacy.engine.callbacks import CheckpointCallback
  5. # 记录模型参数数量
  6. print('number of model parameters: {}'.format(model.num_parameters()))
  7. ckpoint_cb = CheckpointCallback(save_path='checkpoint', ckpt_name='gpt2_summarization',
  8. epochs=1, keep_checkpoint_max=2)
  9. trainer = Trainer(network=model,
  10. train_dataset=train_dataset,
  11. epochs=1,
  12. optimizer=optimizer,
  13. callbacks=ckpoint_cb)
  14. trainer.set_amp(level='O1') # 开启混合精度
  15. trainer.run(tgt_columns="labels")

部分输出

注:建议使用较高规格的算力,训练时间较长

部分输出2(减少训练数据)

此次活动的 notebook 只可以连续运行8小时,此次目的也不是性能优化,故此,我将训练数据减少到了1/10,此时的部分输出如下。

推理流程

  1. ## 向量数据转为中文数据
  2. def process_test_dataset(dataset, tokenizer, batch_size=1, max_seq_len=1024, max_summary_len=100):
  3. def read_map(text):
  4. data = json.loads(text.tobytes())
  5. return np.array(data['article']), np.array(data['summarization'])
  6. def pad(article):
  7. tokenized = tokenizer(text=article, truncation=True, max_length=max_seq_len-max_summary_len)
  8. return tokenized['input_ids']
  9. dataset = dataset.map(read_map, 'text', ['article', 'summary'])
  10. dataset = dataset.map(pad, 'article', ['input_ids'])
  11. dataset = dataset.batch(batch_size)
  12. return dataset
  13. test_dataset = process_test_dataset(test_dataset, tokenizer, batch_size=1)
  14. print(next(test_dataset.create_tuple_iterator(output_numpy=True)))
  15. model = GPT2LMHeadModel.from_pretrained('./checkpoint/gpt2_summarization_epoch_0.ckpt', config=config)
  16. model.set_train(False)
  17. model.config.eos_token_id = model.config.sep_token_id
  18. i = 0
  19. for (input_ids, raw_summary) in test_dataset.create_tuple_iterator():
  20. output_ids = model.generate(input_ids, max_new_tokens=50, num_beams=5, no_repeat_ngram_size=2)
  21. output_text = tokenizer.decode(output_ids[0].tolist())
  22. print(output_text)
  23. i += 1
  24. if i == 1:
  25. break

减少训练数据后的模型推理结果展示。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/865033
推荐阅读
相关标签
  

闽ICP备14008679号