赞
踩
第一步:生成 word_embedding
word_embedding_model = models.Transformer(model_name)
第二步:池化 pooling
- pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),
- pooling_mode_mean_tokens=True,
- pooling_mode_cls_token=False,
- pooling_mode_max_tokens=False
第三步:句子相似度比较模型
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
第四步:数据形式InputExample
csv文件里面的数据为:
inp_example = InputExample(texts=[texts=[row['sentence1'], row['sentence2']]], label=score)
其中,InputExample是一个普通的类,他有两属性texts和label, texts就是两个句子, label就是score
困惑点1:下面这段代码已经完全将csv文件里面的内容读取到内存了,在使用DataLoader的作用适合,仅仅只是为了获取做batch分割和数据顺序扰乱吗?
- train_samples = []
- dev_samples = []
- test_samples = []
- with gzip.open(sts_dataset_path, 'rt', encoding='utf-8') as fIn:
- reader = csv.DictReader(fIn, delimiter='\t', quoting=csv.QUOTE_NONE)
- for row in reader:
- score = float(row['score']) / 5.0
- inp_example = InputExample(texts=[row['sentence1'], row['sentence2']], label=score)
-
- if row['split'] == 'dev':
- dev_samples.append(inp_example)
-
- elif row['split'] == 'test':
- test_samples.append(inp_example)
-
- else:
- train_samples.append(inp_example)
- train_dataloader = DataLoader(train_samples, shuffle=True, batch_size=train_batch_size)
第五步:相似度损失CosineSimilarityLoss
train_loss = losses.CosineSimilarityLoss(model=model)
第六步:训练,evaluator是训练过程的评估器, 暂时不予考虑
- model.fit(train_objectives=[(train_dataloader, train_loss)],
- evaluator=evaluator,
- epochs=num_epochs,
- evaluation_steps=1000,
- warmup_steps=warmup_steps,
- output_path=model_save_path
- )
代码的关键文件:
文件名 | 关键的类和变量 | 备注 |
configuration_auto.py | AutoConfig:用于加载模型参数,使用的是from_pretrained方法。主要包括两个用途:第一个是利用PretrainedConfig读取参数文件并将其转化为字典,第二个用途是利用CONFIG_MAPPING来选择模型配置类,然后利用from_dict这个方法来将参数转化成模型配置类的形式,用于传递AutoModel类生成模型 CONFIG_MAPPING:是一个OrderedDict常量,用于选择定义的模型配置类 eg: DistilBertConfig(configure_distilbert.py), 这个类继承PretriainedConfig, 因此from_dict方法可以在所有的模型配置类中使用 模型配置类的作用是生成模型的一些参数 | from_pretrained方法 一个模型配置类 |
configuration_uitls.py | PretrainedConfig: 读取模型参数,主要用到的是get_config_dict()这个方法,返回是一个dict。同时该类是一个模型配置类的基类。 | |
Transform.py (此文件在本地) | Transformer类:他继承自torch.nn.Module,说明模型的结构肯定在在里面,并且会有forward方法。 此类实例化时会生成同时实例化模型和实例化分词 | |
configuration_auto.py
AutoConfig类:用于加载模型参数
CONFIG_MAPPING: 这是一个常量dict,用于定义选择模型
训练的这个loss这个模型, 这里的loss是一个model, 不仅仅是计算l损失的功能
- CosineSimilarityLoss(
- (model): SentenceTransformer(
- (0): Transformer(
- (auto_model): DistilBertModel(
- (embeddings): Embeddings(
- (word_embeddings): Embedding(30522, 768, padding_idx=0)
- (position_embeddings): Embedding(512, 768)
- (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- (dropout): Dropout(p=0.1, inplace=False)
- )
- (transformer): Transformer(
- (layer): ModuleList(
- (0): TransformerBlock(
- (attention): MultiHeadSelfAttention(
- (dropout): Dropout(p=0.1, inplace=False)
- (q_lin): Linear(in_features=768, out_features=768, bias=True)
- (k_lin): Linear(in_features=768, out_features=768, bias=True)
- (v_lin): Linear(in_features=768, out_features=768, bias=True)
- (out_lin): Linear(in_features=768, out_features=768, bias=True)
- )
- (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- (ffn): FFN(
- (dropout): Dropout(p=0.1, inplace=False)
- (lin1): Linear(in_features=768, out_features=3072, bias=True)
- (lin2): Linear(in_features=3072, out_features=768, bias=True)
- )
- (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- )
- (1): TransformerBlock(
- (attention): MultiHeadSelfAttention(
- (dropout): Dropout(p=0.1, inplace=False)
- (q_lin): Linear(in_features=768, out_features=768, bias=True)
- (k_lin): Linear(in_features=768, out_features=768, bias=True)
- (v_lin): Linear(in_features=768, out_features=768, bias=True)
- (out_lin): Linear(in_features=768, out_features=768, bias=True)
- )
- (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- (ffn): FFN(
- (dropout): Dropout(p=0.1, inplace=False)
- (lin1): Linear(in_features=768, out_features=3072, bias=True)
- (lin2): Linear(in_features=3072, out_features=768, bias=True)
- )
- (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- )
- (2): TransformerBlock(
- (attention): MultiHeadSelfAttention(
- (dropout): Dropout(p=0.1, inplace=False)
- (q_lin): Linear(in_features=768, out_features=768, bias=True)
- (k_lin): Linear(in_features=768, out_features=768, bias=True)
- (v_lin): Linear(in_features=768, out_features=768, bias=True)
- (out_lin): Linear(in_features=768, out_features=768, bias=True)
- )
- (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- (ffn): FFN(
- (dropout): Dropout(p=0.1, inplace=False)
- (lin1): Linear(in_features=768, out_features=3072, bias=True)
- (lin2): Linear(in_features=3072, out_features=768, bias=True)
- )
- (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- )
- (3): TransformerBlock(
- (attention): MultiHeadSelfAttention(
- (dropout): Dropout(p=0.1, inplace=False)
- (q_lin): Linear(in_features=768, out_features=768, bias=True)
- (k_lin): Linear(in_features=768, out_features=768, bias=True)
- (v_lin): Linear(in_features=768, out_features=768, bias=True)
- (out_lin): Linear(in_features=768, out_features=768, bias=True)
- )
- (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- (ffn): FFN(
- (dropout): Dropout(p=0.1, inplace=False)
- (lin1): Linear(in_features=768, out_features=3072, bias=True)
- (lin2): Linear(in_features=3072, out_features=768, bias=True)
- )
- (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- )
- (4): TransformerBlock(
- (attention): MultiHeadSelfAttention(
- (dropout): Dropout(p=0.1, inplace=False)
- (q_lin): Linear(in_features=768, out_features=768, bias=True)
- (k_lin): Linear(in_features=768, out_features=768, bias=True)
- (v_lin): Linear(in_features=768, out_features=768, bias=True)
- (out_lin): Linear(in_features=768, out_features=768, bias=True)
- )
- (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- (ffn): FFN(
- (dropout): Dropout(p=0.1, inplace=False)
- (lin1): Linear(in_features=768, out_features=3072, bias=True)
- (lin2): Linear(in_features=3072, out_features=768, bias=True)
- )
- (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- )
- (5): TransformerBlock(
- (attention): MultiHeadSelfAttention(
- (dropout): Dropout(p=0.1, inplace=False)
- (q_lin): Linear(in_features=768, out_features=768, bias=True)
- (k_lin): Linear(in_features=768, out_features=768, bias=True)
- (v_lin): Linear(in_features=768, out_features=768, bias=True)
- (out_lin): Linear(in_features=768, out_features=768, bias=True)
- )
- (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- (ffn): FFN(
- (dropout): Dropout(p=0.1, inplace=False)
- (lin1): Linear(in_features=768, out_features=3072, bias=True)
- (lin2): Linear(in_features=3072, out_features=768, bias=True)
- )
- (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
- )
- )
- )
- )
- )
- (1): Pooling()
- )
- (loss_fct): MSELoss()
- (cos_score_transformation): Identity()
- )
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。