当前位置:   article > 正文

【阅读源码】Transformer的FFN机制源码解读(dropout)_ffn代码详解

ffn代码详解
class PositionwiseFeedForward(nn.Module):
    "Implements FFN equation."
    def __init__(self, d_model, d_ff, dropout=0.1):
        super(PositionwiseFeedForward, self).__init__()
        self.w_1 = nn.Linear(d_model, d_ff)#剖析点1
        self.w_2 = nn.Linear(d_ff, d_model)
        self.dropout = nn.Dropout(dropout)#剖析点2

    def forward(self, x):
        return self.w_2(self.dropout(F.relu(self.w_1(x))))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

剖析源码

1 剖析点1:self.w_1 = nn.Linear(d_model, d_ff)

这里的d_model是embedding的长度一般取512
d_ff是inner_layer的维度:2048

2 剖析点2:nn.Dropout(dropout)

参考:https://blog.csdn.net/weixin_42979152/article/details/113769291
注意区别nn.Dropout(dropout)F.dropout(dropout)

dropout=torch.tensor(0.5)
print(dropout)
torch.nn.functional.dropout(dropout)
  • 1
  • 2
  • 3
  • torch.nn.functional.dropout的函数头是torch.nn.functional.dropout(input,p=0.5,training=False,inplace=False)
    这里必须要手动设置training=True,否则是没有启用dropout的。
  • torch.nn.Dropout的函数头是torch.nn.Dropout(p=0.5,inplace=False)
    功能:将输入的张量的部分元素设置为0,对于每次前向调用,被置为0的元素都是随机的
    p:为将元素置为0的概率
    inplace:若设置为True则表示原地操作
    输入是张量
    输出也是张量,且输出的张量与输入的张量维度相同
from torch import autograd
m=nn.Dropout(p=0.3)
input=autograd.Variable(torch.rand(2,3))
print(input)
output=m(input)
print(output)
#输出
tensor([[0.6117, 0.5744, 0.0756],
        [0.9749, 0.2046, 0.4306]])
tensor([[0.0000, 0.8205, 0.1080],
        [0.0000, 0.0000, 0.0000]])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
import torch
import torch.nn as nn
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.dropout_1 = nn.Dropout(0.5)
        self.dropout_2 = nn.Dropout(0.5)
    def forward(self, input):
        # print(input)
        drop_1 = self.dropout_1(input)
        print(drop_1)
        drop_1 = self.dropout_1(input)
        print(drop_1)
        drop_2 = self.dropout_2(input)
        print(drop_2)
if __name__ == '__main__':
    i = torch.rand((3,3))
    print(i.shape)
    print(i)
    m = MyModel()
    m.forward(i)
#输出
torch.Size([3, 3])
tensor([[0.2487, 0.1715, 0.3385],
        [0.0692, 0.9432, 0.7410],
        [0.6616, 0.7565, 0.8751]])
tensor([[0.0000, 0.3430, 0.6769],
        [0.0000, 1.8864, 1.4819],
        [1.3233, 1.5130, 1.7502]])
tensor([[0.0000, 0.0000, 0.0000],
        [0.1385, 0.0000, 0.0000],
        [0.0000, 1.5130, 1.7502]])
tensor([[0.4974, 0.3430, 0.0000],
        [0.0000, 1.8864, 0.0000],
        [0.0000, 0.0000, 0.0000]])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/719381
推荐阅读
相关标签
  

闽ICP备14008679号