当前位置:   article > 正文

dropout层、线性层、layernorm_dropout-layer-norm

dropout-layer-norm

dropout层:防止过拟合

nn.dropout 每次将 p 元素设置为 0,剩下的元素乘以 1/(1-p)

eval()模式不进行dropout

使用方法如下:

  1. In [1]: import torch
  2. In [2]: p = 0.5
  3. In [3]: module = torch.nn.Dropout(p)
  4. In [4]: module.training
  5. Out[4]: True
  6. In [5]: inp = torch.ones(3,5)
  7. In [6]: print(inp)
  8. tensor([[1., 1., 1., 1., 1.],
  9. [1., 1., 1., 1., 1.],
  10. [1., 1., 1., 1., 1.]])
  11. In [7]: module(inp)
  12. Out[7]:
  13. tensor([[0., 0., 0., 2., 0.],
  14. [2., 2., 0., 0., 2.],
  15. [2., 0., 2., 2., 2.]])
  16. In [10]: 1/(1-p)
  17. Out[10]: 2.0
  18. In [11]: module.eval()
  19. Out[11]: Dropout(p=0.5, inplace=False)
  20. In [12]: module.training
  21. Out[12]: False
  22. In [13]: module(inp)
  23. Out[13]:
  24. tensor([[1., 1., 1., 1., 1.],
  25. [1., 1., 1., 1., 1.],
  26. [1., 1., 1., 1., 1.]])

Linear层

  1. In [1]: import torch
  2. In [2]: module = torch.nn.Linear(10,20)
  3. In [3]: module
  4. Out[3]: Linear(in_features=10, out_features=20, bias=True)
  5. In [4]: n_samples = 40
  6. In [5]: inp_2d = torch.rand(n_samples,10)
  7. In [6]: module(inp_2d).shape
  8. Out[6]: torch.Size([40, 20])
  9. In [7]: inp_3d = torch.rand(n_samples,33,10)
  10. In [8]: module(inp_3d).shape
  11. Out[8]: torch.Size([40, 33, 20])
  12. In [10]: input_7d = torch.rand(n_samples,2,3,4,5,6,10)
  13. In [12]: module(input_7d).shape
  14. Out[12]: torch.Size([40, 2, 3, 4, 5, 6, 20])

LayerNorm属性

可学习参数为0

  1. In [1]: import torch
  2. In [2]: inp = torch.tensor([[0,4.],[-1,7],[3,5]])
  3. In [3]: n_samples,n_features = inp.shape
  4. In [4]: print(n_samples)
  5. 3
  6. In [5]: module = torch.nn.LayerNorm(n_features, elementwise_affine=False)
  7. In [6]: sum(p.numel() for p in module.parameters() if p.requires_grad)
  8. Out[6]: 0
  9. In [7]: inp.mean(-1),inp.std(-1,unbiased=False)
  10. Out[7]: (tensor([2., 3., 4.]), tensor([2., 4., 1.]))
  11. In [8]: module(inp).mean(-1),module(inp).std(-1,unbiased=False)
  12. Out[8]:
  13. (tensor([ 0.0000e+00, -2.9802e-08, 1.1921e-07]),
  14. tensor([1.0000, 1.0000, 1.0000]))

可学习参数为4

  1. In [9]: module = torch.nn.LayerNorm(n_features, elementwise_affine=True)
  2. In [10]: sum(p.numel() for p in module.parameters() if p.requires_grad)
  3. Out[10]: 4
  4. In [11]: (module.bias,module.weight)
  5. Out[11]:
  6. (Parameter containing:
  7. tensor([0., 0.], requires_grad=True),
  8. Parameter containing:
  9. tensor([1., 1.], requires_grad=True))
  10. In [12]: module(inp).mean(-1),module(inp).std(-1,unbiased=False)
  11. Out[12]:
  12. (tensor([ 0.0000e+00, -2.9802e-08, 1.1921e-07], grad_fn=<MeanBackward1>),
  13. tensor([1.0000, 1.0000, 1.0000], grad_fn=<StdBackward1>))
  14. In [13]: module.bias.data += 1
  15. In [14]: module.weight.data *= 4
  16. In [15]: module(inp).mean(-1),module(inp).std(-1,unbiased=False)
  17. Out[15]:
  18. (tensor([1.0000, 1.0000, 1.0000], grad_fn=<MeanBackward1>),
  19. tensor([4.0000, 4.0000, 4.0000], grad_fn=<StdBackward1>))

只更新n_features,样本间独立,与n_samples无关

  1. In [16]: module(torch.rand(n_samples,2,3,4,5,6,n_features)).shape
  2. Out[16]: torch.Size([3, 2, 3, 4, 5, 6, 2])
  3. In [17]: module(torch.rand(n_samples,2,3,4,5,6,n_features)).mean(-1)
  4. Out[17]:
  5. tensor([[[[[[1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  6. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  7. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  8. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  9. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]],
  10. [[1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  11. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  12. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  13. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
  14. [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]]]]]],
  15. grad_fn=<MeanBackward1>)
  16. In [18]: module(torch.rand(n_samples,2,3,4,5,6,n_features)).std(-1,unbiased=False)
  17. Out[18]:
  18. tensor([[[[[[3.9998, 3.9962, 3.9993, 3.9966, 3.9657, 3.9954],
  19. [3.9998, 3.9997, 3.9996, 3.9981, 3.9981, 3.9983],
  20. [3.9799, 3.9996, 3.9657, 3.9998, 3.9998, 3.9986],
  21. [3.9990, 3.9950, 3.9885, 3.9996, 3.9582, 3.9996],
  22. [3.9987, 3.9989, 3.9900, 3.9992, 3.9992, 3.9994]],
  23. [[3.9996, 3.9996, 3.9972, 3.9931, 3.9998, 3.5468],
  24. [3.9998, 3.9808, 3.9974, 3.9985, 3.9992, 3.9986],
  25. [1.1207, 3.9993, 3.9998, 3.6870, 3.9997, 3.9981],
  26. [3.9998, 3.9998, 3.9986, 3.9676, 3.9999, 3.9998],
  27. [3.9993, 3.9999, 3.9998, 3.9852, 3.9993, 3.9891]]]]]],
  28. grad_fn=<StdBackward1>)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/908571
推荐阅读
  

闽ICP备14008679号