赞
踩
该函数描述了 Transformer 编码器层的前向传播过程,包括自注意力(self-attention)和前馈神经网络(feedforward network)的处理步骤,以及层归一化的应用。
定义了如何将输入数据通过编码器层进行前向传播
这个 forward
函数执行以下操作:
将输入 src
(输入序列)保存在变量 x
中,这是为了在后面的计算中使用。
根据参数 norm_first
的值,选择执行不同的顺序。如果 norm_first
为 True
,则首先应用层归一化 (self.norm1
),然后将结果传递给自注意力块 (self._sa_block
),最后再应用第二次层归一化 (self.norm2
) 和前馈块 (self._ff_block
)。
如果 norm_first
为 False
,则首先执行自注意力块 (self._sa_block
),然后应用第一次层归一化 (self.norm1
),接着执行前馈块 (self._ff_block
),最后再应用第二次层归一化 (self.norm2
)。
- class TransformerEncoderLayer(Module):
- r"""TransformerEncoderLayer is made up of self-attn and feedforward network.
- This standard encoder layer is based on the paper "Attention Is All You Need".
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
- Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in
- Neural Information Processing Systems, pages 6000-6010. Users may modify or implement
- in a different way during application.
- Args:
- d_model: the number of expected features in the input (required).
- nhead: the number of heads in the multiheadattention models (required).
- dim_feedforward: the dimension of the feedforward network model (default=2048).
- dropout: the dropout value (default=0.1).
- activation: the activation function of the intermediate layer, can be a string
- ("relu" or "gelu") or a unary callable. Default: relu
- layer_norm_eps: the eps value in layer normalization components (default=1e-5).
- batch_first: If ``True``, then the input and output tensors are provided
- as (batch, seq, feature). Default: ``False``.
- norm_first: if ``True``, layer norm is done prior to attention and feedforward
- operations, respectivaly. Otherwise it's done after. Default: ``False`` (after).
- Examples::
- >>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
- >>> src = torch.rand(10, 32, 512)
- >>> out = encoder_layer(src)
- Alternatively, when ``batch_first`` is ``True``:
- >>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8, batch_first=True)
- >>> src = torch.rand(32, 10, 512)
- >>> out = encoder_layer(src)
- """
- __constants__ = ['batch_first', 'norm_first']
-
- def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=F.relu,
- layer_norm_eps=1e-5, batch_first=False, norm_first=False,
- device=None, dtype=None) -> None:
- factory_kwargs = {'device': device, 'dtype': dtype}
- super(TransformerEncoderLayer, self).__init__()
- self.self_attn = MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first,
- **factory_kwargs)
- # Implementation of Feedforward model
- self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs)
- self.dropout = Dropout(dropout)
- self.linear2 = Linear(dim_feedforward, d_model, **factory_kwargs)
-
- self.norm_first = norm_first
- self.norm1 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
- self.norm2 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
- self.dropout1 = Dropout(dropout)
- self.dropout2 = Dropout(dropout)
-
- # Legacy string support for activation function.
- if isinstance(activation, str):
- self.activation = _get_activation_fn(activation)
- else:
- self.activation = activation
-
- def __setstate__(self, state):
- if 'activation' not in state:
- state['activation'] = F.relu
- super(TransformerEncoderLayer, self).__setstate__(state)
-
- def forward(self, src: Tensor, src_mask: Optional[Tensor] = None, src_key_padding_mask: Optional[Tensor] = None) -> Tensor:
- r"""Pass the input through the encoder layer.
- Args:
- src: the sequence to the encoder layer (required).
- src_mask: the mask for the src sequence (optional).
- src_key_padding_mask: the mask for the src keys per batch (optional).
- Shape:
- see the docs in Transformer class.
- """
-
- # see Fig. 1 of https://arxiv.org/pdf/2002.04745v1.pdf
-
- x = src
- if self.norm_first:
- x = x + self._sa_block(self.norm1(x), src_mask, src_key_padding_mask)
- x = x + self._ff_block(self.norm2(x))
- else:
- x = self.norm1(x + self._sa_block(x, src_mask, src_key_padding_mask))
- x = self.norm2(x + self._ff_block(x))
-
- return x
-
- # self-attention block
- def _sa_block(self, x: Tensor,
- attn_mask: Optional[Tensor], key_padding_mask: Optional[Tensor]) -> Tensor:
- x = self.self_attn(x, x, x,
- attn_mask=attn_mask,
- key_padding_mask=key_padding_mask,
- need_weights=False)[0]
- return self.dropout1(x)
-
- # feed forward block
- def _ff_block(self, x: Tensor) -> Tensor:
- x = self.linear2(self.dropout(self.activation(self.linear1(x))))
- return self.dropout2(x)
data:image/s3,"s3://crabby-images/deb9d/deb9d52e6c78f73fbfaadc6e519fd00d286664e1" alt=""
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。