当前位置:   article > 正文

NLP常用损失函数代码实现——SoftMax/Contrastive/Triplet/Similarity_nlp 损失函数

nlp 损失函数

NLP常用损失函数代码实现

  NLP常用的损失函数主要包括多类分类(SoftMax + CrossEntropy)、对比学习(Contrastive Learning)、三元组损失(Triplet Loss)和文本相似度(Sentence Similarity)。其中分类和文本相似度是非常常用的两个损失函数,对比学习和三元组损失则是近两年比较新颖的自监督损失函数。

  本文不是对损失函数的理论讲解,只是简单对这四个损失函数进行了实现,方便在模型实验中快速嵌入损失函数模块。为了能够快速直观地看到损失函数的执行过程和结果,本文基于HuggingFace-BERT实现简单的演示(没有训练过程)。读者可以在自己的模型框架中直接嵌套相应的损失函数。


一、分类损失——SoftMax+CrossEntropy

  分类损失表示输入一个句子(或一个句子对),对齐进行多类分类。代码如下所示:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 16:25
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : SoftmaxLayerWithLoss.py
# !/usr/bin/env python
# coding=utf-8

import torch
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig

class SoftmaxLayerWithLoss(nn.Module):
    """
    This loss aims to calculate softmax between input sentences (pairs) with labels

    @:param hidden_dim: The hidden dimension
    @:param num_labels: The number of labels
    @:param is_sentence_pair: (bool) Whether to feed sentence pair
    @:param combine_type: The type of combination of sentence pair:
    - cat: rep = torch.cat([rep_a, rep_b], -1)
    - diff: rep = rep_a - rep_b
    - mul: rep = rep_a * rep_b
    - avg: rep =  (rep_a + rep_b) / 2.0
    - sum: rep = rep_a + rep_b



    """
    def __init__(self,
                 hidden_dim: int,
                 num_labels: int,
                 is_sentence_pair=False,
                 combine_type='cat', # cat / diff / mul / avg / sum
                 ):
        super(SoftmaxLayerWithLoss, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_labels = num_labels
        self.is_sentence_pair = is_sentence_pair
        self.combine_type = combine_type
        assert self.combine_type in ['cat', 'diff', 'mul', 'avg', 'sum']
        if self.combine_type == 'cat':
            self.hidden_dim = self.hidden_dim * 2

        self.classifier = nn.Linear(self.hidden_dim, num_labels)

    def forward(self, rep_a, rep_b=None, label: Tensor=None):
        # rep_a: [batch_size, hidden_dim]
        # rep_b: [batch_size, hidden_dim]
        rep = None
        if self.combine_type == 'cat':
            rep = torch.cat([rep_a, rep_b], -1)

        if self.combine_type == 'diff':
            rep = rep_a - rep_b

        if self.combine_type == 'mul':
            rep = rep_a * rep_b

        if self.combine_type == 'avg':
            rep = (rep_a + rep_b) / 2

        if self.combine_type == 'sum':
            rep = rep_a + rep_b

        output = self.classifier(rep)
        loss_fct = nn.CrossEntropyLoss()

        if label is not None:
            loss = loss_fct(output, label.view(-1))
            return loss
        else:
            return rep, output


if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    examples1 = ['This is the book.', 'Disney film is well seeing for us.']
    examples2 = ['I love to read it.', 'I don\'t want to have a try due to the hardness.']
    label = [1, 0]
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    features1 = tokenizer(examples1, add_special_tokens=True, padding=True)
    features2 = tokenizer(examples2, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 16
    features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}
    features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}
    label = torch.Tensor(label).long()
    # obtain sentence embedding by averaged pooling
    rep_a = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_b = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_a = torch.mean(rep_a, -1)  # [batch_size, hidden_dim]
    rep_b = torch.mean(rep_b, -1)  # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = SoftmaxLayerWithLoss(hidden_dim=rep_a.shape[-1], num_labels=2, is_sentence_pair=True, combine_type='cat')
    loss = loss_fn(rep_a=rep_a, rep_b=rep_b, label=label)
    print(loss) # tensor(0.6986, grad_fn=<SumBackward0>)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105

二、文本相似度损失

  文本相似度旨在对两个句子计算其余弦相似度。余弦相似度作为概率值,损失函数则为MSE,代码如下所示:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 16:55
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : SimilarityLoss.py
# !/usr/bin/env python
# coding=utf-8

import torch
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig


class CosineSimilarityLoss(nn.Module):
    """
    CosineSimilarityLoss expects, that the InputExamples consists of two texts and a float label.

    It computes the vectors u = model(input_text[0]) and v = model(input_text[1]) and measures the cosine-similarity between the two.
    By default, it minimizes the following loss: ||input_label - cos_score_transformation(cosine_sim(u,v))||_2.

    :param loss_fct: Which pytorch loss function should be used to compare the cosine_similartiy(u,v) with the input_label? By default, MSE:  ||input_label - cosine_sim(u,v)||_2
    :param cos_score_transformation: The cos_score_transformation function is applied on top of cosine_similarity. By default, the identify function is used (i.e. no change).

    """
    def __init__(self, loss_fct = nn.MSELoss(), cos_score_transformation=nn.Identity()):
        super(CosineSimilarityLoss, self).__init__()
        self.loss_fct = loss_fct
        self.cos_score_transformation = cos_score_transformation

    def forward(self, rep_a, rep_b, label: Tensor):
        # rep_a: [batch_size, hidden_dim]
        # rep_b: [batch_size, hidden_dim]
        output = self.cos_score_transformation(torch.cosine_similarity(rep_a, rep_b))
        # print(output) # tensor([0.9925, 0.5846], grad_fn=<DivBackward0>), tensor(0.1709, grad_fn=<MseLossBackward0>)
        return self.loss_fct(output, label.view(-1))

if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    examples1 = ['Beijing is one of the biggest city in China.', 'Disney film is well seeing for us.']
    examples2 = ['Shanghai is the largest city in east of China.', 'ACL 2021 will be held in line due to COVID-19.']
    label = [1, 0]
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    features1 = tokenizer(examples1, add_special_tokens=True, padding=True)
    features2 = tokenizer(examples2, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 24
    features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}
    features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}
    label = torch.Tensor(label).long()
    # obtain sentence embedding by averaged pooling
    rep_a = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_b = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_a = torch.mean(rep_a, -1)  # [batch_size, hidden_dim]
    rep_b = torch.mean(rep_b, -1)  # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = CosineSimilarityLoss()
    loss = loss_fn(rep_a=rep_a, rep_b=rep_b, label=label)
    print(loss) # tensor(0.1709, grad_fn=<SumBackward0>)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66

三、对比损失

  对比学习(Contrastive Learning)指的是给定一个anchor以及若干候选项。anchor表示一个确定的特征向量,或由神经网络(例如BERT)表征的向量,candidate则是一组候选项,其中包含positive(与anchor同类)和若干negative(与anchor不同类)。对比学习的目标是尽可能让同类的相似度更大,不同类的相似度越小。详细可看如下代码以及实例:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 14:50
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : ContrastiveLoss.py
# !/usr/bin/env python
# coding=utf-8

from enum import Enum
import torch
import torch.nn.functional as F
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig

class SiameseDistanceMetric(Enum):
    """
    The metric for the contrastive loss
    """
    EUCLIDEAN = lambda x, y: F.pairwise_distance(x, y, p=2)
    MANHATTAN = lambda x, y: F.pairwise_distance(x, y, p=1)
    COSINE_DISTANCE = lambda x, y: 1-F.cosine_similarity(x, y)


class ContrastiveLoss(nn.Module):
    """
    Contrastive loss. Expects as input two texts and a label of either 0 or 1. If the label == 1, then the distance between the
    two embeddings is reduced. If the label == 0, then the distance between the embeddings is increased.

    @:param distance_metric: The distance metric function
    @:param margin: (float) The margin distance
    @:param size_average: (bool) Whether to get averaged loss

    Input example of forward function:
        rep_anchor: [[0.2, -0.1, ..., 0.6], [0.2, -0.1, ..., 0.6], ..., [0.2, -0.1, ..., 0.6]]
        rep_candidate: [[0.3, 0.1, ...m -0.3], [-0.8, 1.2, ..., 0.7], ..., [-0.9, 0.1, ..., 0.4]]
        label: [0, 1, ..., 1]

    Return example of forward function:
        0.015 (averged)
        2.672 (sum)
    """

    def __init__(self, distance_metric=SiameseDistanceMetric.COSINE_DISTANCE, margin: float = 0.5, size_average:bool = False):
        super(ContrastiveLoss, self).__init__()
        self.distance_metric = distance_metric
        self.margin = margin
        self.size_average = size_average

    def forward(self, rep_anchor, rep_candidate, label: Tensor):
        # rep_anchor: [batch_size, hidden_dim] denotes the representations of anchors
        # rep_candidate: [batch_size, hidden_dim] denotes the representations of positive / negative
        # label: [batch_size, hidden_dim] denotes the label of each anchor - candidate pair

        distances = self.distance_metric(rep_anchor, rep_candidate)
        losses = 0.5 * (label.float() * distances.pow(2) + (1 - label).float() * F.relu(self.margin - distances).pow(2))
        return losses.mean() if self.size_average else losses.sum()


if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    examples1 = ['This is the sentence anchor 1.', 'It is the second sentence in this article named Section D.']
    examples2 = ['It is the same as anchor 1.', 'I think it is different with Section D.']
    label = [1, 0]
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    features1 = tokenizer(examples1, add_special_tokens=True, padding=True)
    features2 = tokenizer(examples2, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 16
    features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}
    features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}
    label = torch.Tensor(label).long()
    # obtain sentence embedding by averaged pooling
    rep_anchor = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_candidate = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_anchor = torch.mean(rep_anchor, -1) # [batch_size, hidden_dim]
    rep_candidate = torch.mean(rep_candidate, -1) # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = ContrastiveLoss()
    loss = loss_fn(rep_anchor=rep_anchor, rep_candidate=rep_candidate, label=label)
    print(loss) # tensor(0.0869, grad_fn=<SumBackward0>)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88

四、三元组损失

  三元组损失(Triplet Loss)与对比学习比较类似,其旨在拉近anchor与positive的距离,拉开anchor与negative的距离。不同之处在于Triplet Loss考虑到anchor与其他表征向量的最小距离margin值,损失函数则是margin loss。代码如下所示:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 15:25
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : TripletLoss.py
# !/usr/bin/env python
# coding=utf-8

from enum import Enum
import torch
from torch import nn, Tensor
import torch.nn.functional as F
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig

class TripletDistanceMetric(Enum):
    """
    The metric for the triplet loss
    """
    COSINE = lambda x, y: 1 - F.cosine_similarity(x, y)
    EUCLIDEAN = lambda x, y: F.pairwise_distance(x, y, p=2)
    MANHATTAN = lambda x, y: F.pairwise_distance(x, y, p=1)

class TripletLoss(nn.Module):
    """
    This class implements triplet loss. Given a triplet of (anchor, positive, negative),
    the loss minimizes the distance between anchor and positive while it maximizes the distance
    between anchor and negative. It compute the following loss function:

    loss = max(||anchor - positive|| - ||anchor - negative|| + margin, 0).

    Margin is an important hyperparameter and needs to be tuned respectively.

    @:param distance_metric: The distance metric function
    @:param triplet_margin: (float) The margin distance

    Input example of forward function:
        rep_anchor: [[0.2, -0.1, ..., 0.6], [0.2, -0.1, ..., 0.6], ..., [0.2, -0.1, ..., 0.6]]
        rep_candidate: [[0.3, 0.1, ...m -0.3], [-0.8, 1.2, ..., 0.7], ..., [-0.9, 0.1, ..., 0.4]]
        label: [0, 1, ..., 1]

    Return example of forward function:
        0.015 (averged)
        2.672 (sum)

    """
    def __init__(self, distance_metric=TripletDistanceMetric.EUCLIDEAN, triplet_margin: float = 0.5):
        super(TripletLoss, self).__init__()
        self.distance_metric = distance_metric
        self.triplet_margin = triplet_margin


    def forward(self, rep_anchor, rep_positive, rep_negative):
        # rep_anchor: [batch_size, hidden_dim] denotes the representations of anchors
        # rep_positive: [batch_size, hidden_dim] denotes the representations of positive, sometimes, it canbe dropout
        # rep_negative: [batch_size, hidden_dim] denotes the representations of negative
        # label: [batch_size, hidden_dim] denotes the label of each anchor - candidate pair
        distance_pos = self.distance_metric(rep_anchor, rep_positive)
        distance_neg = self.distance_metric(rep_anchor, rep_negative)

        losses = F.relu(distance_pos - distance_neg + self.triplet_margin)
        return losses.mean()


if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    anchor_example = ['I am an anchor, which is the source example sampled from corpora.'] # anchor sentence
    positive_example = [
        'I am an anchor, which is the source example.',
        'I am the source example sampled from corpora.'
    ] # positive, which randomly dropout or noise from anchor
    negative_example = [
        'It is different with the anchor.',
        'My name is Jianing Wang, please give me some stars, thank you!'
    ] # negative, which randomly sampled from corpora
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    anchor_feature = tokenizer(anchor_example, add_special_tokens=True, padding=True)
    positive_feature = tokenizer(positive_example, add_special_tokens=True, padding=True)
    negative_feature = tokenizer(negative_example, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 24
    anchor_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in anchor_feature.items()}
    positive_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in positive_feature.items()}
    negative_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in negative_feature.items()}
    # obtain sentence embedding by averaged pooling
    rep_anchor = model(**anchor_feature)[0] # [1, max_seq_len, hidden_dim]
    rep_positive = model(**positive_feature)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_negative = model(**negative_feature)[0] # [batch_size, max_seq_len, hidden_dim]
    # repeat
    rep_anchor = torch.mean(rep_anchor, -1) # [1, hidden_dim]
    rep_positive = torch.mean(rep_positive, -1) # [batch_size, hidden_dim]
    rep_negative = torch.mean(rep_negative, -1) # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = TripletLoss()
    loss = loss_fn(rep_anchor=rep_anchor, rep_positive=rep_positive, rep_negative=rep_negative)
    print(loss) # tensor(0.5001, grad_fn=<MeanBackward0>)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/523740
推荐阅读
相关标签
  

闽ICP备14008679号