当前位置:   article > 正文

BERT_SentenceTransformer_self.sbert_model = sentencetransformer("cardiffnlp

self.sbert_model = sentencetransformer("cardiffnlp/twitter-roberta-base-emot

1.总体结构理解:

第一步:生成 word_embedding

word_embedding_model = models.Transformer(model_name)

第二步:池化 pooling

  1. pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),
  2. pooling_mode_mean_tokens=True,
  3. pooling_mode_cls_token=False,
  4. pooling_mode_max_tokens=False

第三步:句子相似度比较模型

model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

第四步:数据形式InputExample

csv文件里面的数据为:

inp_example = InputExample(texts=[texts=[row['sentence1'], row['sentence2']]], label=score)

其中,InputExample是一个普通的类,他有两属性texts和label, texts就是两个句子, label就是score

困惑点1:下面这段代码已经完全将csv文件里面的内容读取到内存了,在使用DataLoader的作用适合,仅仅只是为了获取做batch分割和数据顺序扰乱吗?

  1. train_samples = []
  2. dev_samples = []
  3. test_samples = []
  4. with gzip.open(sts_dataset_path, 'rt', encoding='utf-8') as fIn:
  5. reader = csv.DictReader(fIn, delimiter='\t', quoting=csv.QUOTE_NONE)
  6. for row in reader:
  7. score = float(row['score']) / 5.0
  8. inp_example = InputExample(texts=[row['sentence1'], row['sentence2']], label=score)
  9. if row['split'] == 'dev':
  10. dev_samples.append(inp_example)
  11. elif row['split'] == 'test':
  12. test_samples.append(inp_example)
  13. else:
  14. train_samples.append(inp_example)
  15. train_dataloader = DataLoader(train_samples, shuffle=True, batch_size=train_batch_size)

第五步:相似度损失CosineSimilarityLoss

train_loss = losses.CosineSimilarityLoss(model=model)

第六步:训练,evaluator是训练过程的评估器, 暂时不予考虑

  1. model.fit(train_objectives=[(train_dataloader, train_loss)],
  2. evaluator=evaluator,
  3. epochs=num_epochs,
  4. evaluation_steps=1000,
  5. warmup_steps=warmup_steps,
  6. output_path=model_save_path
  7. )

2.结构细节理解:

代码的关键文件:

文件名关键的类和变量备注
configuration_auto.py

AutoConfig:用于加载模型参数,使用的是from_pretrained方法。主要包括两个用途:第一个是利用PretrainedConfig读取参数文件并将其转化为字典,第二个用途是利用CONFIG_MAPPING来选择模型配置类,然后利用from_dict这个方法来将参数转化成模型配置类的形式,用于传递AutoModel类生成模型

CONFIG_MAPPING:是一个OrderedDict常量,用于选择定义的模型配置类 eg: DistilBertConfig(configure_distilbert.py), 这个类继承PretriainedConfig, 因此from_dict方法可以在所有的模型配置类中使用

模型配置类的作用是生成模型的一些参数

from_pretrained方法

一个模型配置类

configuration_uitls.pyPretrainedConfig: 读取模型参数,主要用到的是get_config_dict()这个方法,返回是一个dict。同时该类是一个模型配置类的基类。
Transform.py (此文件在本地)

Transformer类:他继承自torch.nn.Module,说明模型的结构肯定在在里面,并且会有forward方法。 此类实例化时会生成同时实例化模型和实例化分词

configuration_auto.py

AutoConfig类:用于加载模型参数

CONFIG_MAPPING: 这是一个常量dict,用于定义选择模型

2.1 word_embedding_model的结构:

2.2 CosineSimilarityLoss的结构:

训练的这个loss这个模型, 这里的loss是一个model, 不仅仅是计算l损失的功能

  1. CosineSimilarityLoss(
  2. (model): SentenceTransformer(
  3. (0): Transformer(
  4. (auto_model): DistilBertModel(
  5. (embeddings): Embeddings(
  6. (word_embeddings): Embedding(30522, 768, padding_idx=0)
  7. (position_embeddings): Embedding(512, 768)
  8. (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  9. (dropout): Dropout(p=0.1, inplace=False)
  10. )
  11. (transformer): Transformer(
  12. (layer): ModuleList(
  13. (0): TransformerBlock(
  14. (attention): MultiHeadSelfAttention(
  15. (dropout): Dropout(p=0.1, inplace=False)
  16. (q_lin): Linear(in_features=768, out_features=768, bias=True)
  17. (k_lin): Linear(in_features=768, out_features=768, bias=True)
  18. (v_lin): Linear(in_features=768, out_features=768, bias=True)
  19. (out_lin): Linear(in_features=768, out_features=768, bias=True)
  20. )
  21. (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  22. (ffn): FFN(
  23. (dropout): Dropout(p=0.1, inplace=False)
  24. (lin1): Linear(in_features=768, out_features=3072, bias=True)
  25. (lin2): Linear(in_features=3072, out_features=768, bias=True)
  26. )
  27. (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  28. )
  29. (1): TransformerBlock(
  30. (attention): MultiHeadSelfAttention(
  31. (dropout): Dropout(p=0.1, inplace=False)
  32. (q_lin): Linear(in_features=768, out_features=768, bias=True)
  33. (k_lin): Linear(in_features=768, out_features=768, bias=True)
  34. (v_lin): Linear(in_features=768, out_features=768, bias=True)
  35. (out_lin): Linear(in_features=768, out_features=768, bias=True)
  36. )
  37. (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  38. (ffn): FFN(
  39. (dropout): Dropout(p=0.1, inplace=False)
  40. (lin1): Linear(in_features=768, out_features=3072, bias=True)
  41. (lin2): Linear(in_features=3072, out_features=768, bias=True)
  42. )
  43. (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  44. )
  45. (2): TransformerBlock(
  46. (attention): MultiHeadSelfAttention(
  47. (dropout): Dropout(p=0.1, inplace=False)
  48. (q_lin): Linear(in_features=768, out_features=768, bias=True)
  49. (k_lin): Linear(in_features=768, out_features=768, bias=True)
  50. (v_lin): Linear(in_features=768, out_features=768, bias=True)
  51. (out_lin): Linear(in_features=768, out_features=768, bias=True)
  52. )
  53. (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  54. (ffn): FFN(
  55. (dropout): Dropout(p=0.1, inplace=False)
  56. (lin1): Linear(in_features=768, out_features=3072, bias=True)
  57. (lin2): Linear(in_features=3072, out_features=768, bias=True)
  58. )
  59. (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  60. )
  61. (3): TransformerBlock(
  62. (attention): MultiHeadSelfAttention(
  63. (dropout): Dropout(p=0.1, inplace=False)
  64. (q_lin): Linear(in_features=768, out_features=768, bias=True)
  65. (k_lin): Linear(in_features=768, out_features=768, bias=True)
  66. (v_lin): Linear(in_features=768, out_features=768, bias=True)
  67. (out_lin): Linear(in_features=768, out_features=768, bias=True)
  68. )
  69. (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  70. (ffn): FFN(
  71. (dropout): Dropout(p=0.1, inplace=False)
  72. (lin1): Linear(in_features=768, out_features=3072, bias=True)
  73. (lin2): Linear(in_features=3072, out_features=768, bias=True)
  74. )
  75. (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  76. )
  77. (4): TransformerBlock(
  78. (attention): MultiHeadSelfAttention(
  79. (dropout): Dropout(p=0.1, inplace=False)
  80. (q_lin): Linear(in_features=768, out_features=768, bias=True)
  81. (k_lin): Linear(in_features=768, out_features=768, bias=True)
  82. (v_lin): Linear(in_features=768, out_features=768, bias=True)
  83. (out_lin): Linear(in_features=768, out_features=768, bias=True)
  84. )
  85. (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  86. (ffn): FFN(
  87. (dropout): Dropout(p=0.1, inplace=False)
  88. (lin1): Linear(in_features=768, out_features=3072, bias=True)
  89. (lin2): Linear(in_features=3072, out_features=768, bias=True)
  90. )
  91. (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  92. )
  93. (5): TransformerBlock(
  94. (attention): MultiHeadSelfAttention(
  95. (dropout): Dropout(p=0.1, inplace=False)
  96. (q_lin): Linear(in_features=768, out_features=768, bias=True)
  97. (k_lin): Linear(in_features=768, out_features=768, bias=True)
  98. (v_lin): Linear(in_features=768, out_features=768, bias=True)
  99. (out_lin): Linear(in_features=768, out_features=768, bias=True)
  100. )
  101. (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  102. (ffn): FFN(
  103. (dropout): Dropout(p=0.1, inplace=False)
  104. (lin1): Linear(in_features=768, out_features=3072, bias=True)
  105. (lin2): Linear(in_features=3072, out_features=768, bias=True)
  106. )
  107. (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  108. )
  109. )
  110. )
  111. )
  112. )
  113. (1): Pooling()
  114. )
  115. (loss_fct): MSELoss()
  116. (cos_score_transformation): Identity()
  117. )

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/522131
推荐阅读
相关标签
  

闽ICP备14008679号