当前位置:   article > 正文

model.state_dict(),modules(),children(),named_children(),parameters() 详解_model.module.state_dict()

model.module.state_dict()

本博客,参考与:https://blog.csdn.net/qq_31347869/article/details/125065271

定义网络

自定义一个分类网络,主干网络backbone包含Conv,Bn,Relu,Maxpooling,分类网络:2层全连接网络:

import torch 
import torch.nn as nn 

class Net(nn.Module):

    def __init__(self, num_class=10):
        super().__init__()
        
        self.backbone = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(6),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=6, out_channels=12, kernel_size=3),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(12),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.classifier = nn.Sequential(
            nn.Linear(9*8*8, 128),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(128, num_class)
        )

    def forward(self, x):
        output = self.backbone(x)
        output = output.view(output.size()[0], -1)
        output = self.classifier(output)
    
        return output

model = Net()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

网络 Net 本身是一个 nn.Module 的子类,包含了 backbone classifier 两个由 Sequential 容器组成的 nn.Module 子类,backbone 和 classifier 各自又包含一些网络层,这些网络层也都属于 nn.Module 子类,所以从外到内共有三级:

  • Net(nn.Module子类)
  • backbone 和 classifier(Sequential,nn.Module子类),是 Net 的子网络层
  • 具体的网络层如 conv,relu,batchnorm 等(nn.Module子类),是 backbone 或 classifier 的子网络层

model 各种方法的返回值

model.modules() 
>>  <generator object Module.modules at 0x7f3aff9f1740>

model.named_modules()                                                                                                       
>>  <generator object Module.named_modules at 0x7f3aff95bf20>

model.children()                                                                                                    
>>  <generator object Module.children at 0x7f3aff9f1660>

model.parameters()
>>	<generator object Module.parameters at 0x7f3aff95bf90>

model.named_parameters()
>>	<generator object Module.named_parameters at 0x7f3aff95bba0>

model.state_dict()
>> OrderedDict([('backbone.0.weight', tensor([[[[-0.1244, ....1406]]]])), 
('backbone.0.bias', tensor([ 0.0838, -0....  0.0381])), ('backbone.2.weight',
 tensor([1., 1., 1., ..., 1., 1.])), ('backbone.2.bias', tensor([0., 0., 0., 
 ..., 0., 0.])), ('backbone.2.running_mean', tensor([0., 0., 0., ..., 0., 0.])), ('backbone.2.running_var', tensor([1., 1., 1., ..., 1., 1.])), ('backbone.2.num_batches_tracked', tensor(0)), ('backbone.4.weight', tensor([[[[ 0.1197, ....1024]]]])), ('backbone.4.bias', tensor([ 0.0406,  0....  0.0971])), ('backbone.6.weight', tensor([1., 1., 1., ..., 1., 1.])), ('backbone.6.bias', tensor([0., 0., 0., ..., 0., 0.])), ('backbone.6.running_mean', tensor([0., 0., 0., ..., 0., 0.])), ('backbone.6.running_var', tensor([1., 1., 1., ..., 1., 1.])), ('backbone.6.num_batches_tracked', tensor(0)), ...])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

除了 model.state_dict() 返回值为一个有序字典OrderedDict,其他方法的返回值都是一个生成器generator,通过 for 循环将内容保存在一个列表里:

model_modules = [m for m in model.modules()]          
model_named_modules = [m for m in model.named_modules()]     
   
model_children = [m for m in model.children()]         
model_named_children = [m for m in model.named_children()]  
                                                               
model_parameters = [m for m in model.parameters()]                                                                         
model_named_parameters = [m for m in model.named_parameters()]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

model.modules() 和 model.named_modules()

model.modules() 迭代遍历模型的所有子层,子层是指继承了 nn.Module 类的层。

定义的网络模型 Net 中,Net() 本身,backbone()、classifier() 以及二者包含的所有的 layer 都继承了 nn.Module 类,因此会被迭代遍历,且遍历方式符合深度优先遍历。比如对 Net 使用 .modules() 方法,会按照如下顺序遍历:Net --> backbone --> backbone layer --> classifier --> classifier layer

model_modules = [m for m in model.modules()]          
model_named_modules = [m for m in model.named_modules()]     
len(model_modules)   # 15
len(model_named_modules)   # 15
  • 1
  • 2
  • 3
  • 4
  • 输出model_modules
model_modules
>>
00: Net(
  (backbone): Sequential(
    (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU(inplace=True)
    (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
    (5): ReLU(inplace=True)
    (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Linear(in_features=576, out_features=128, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=128, out_features=10, bias=True)
  )
)
01:Sequential(
  (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (1): ReLU(inplace=True)
  (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
  (5): ReLU(inplace=True)
  (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
02:Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
03:ReLU(inplace=True)
04:BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
05:MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
06:Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
07:ReLU(inplace=True)
08:BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
09:MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
10:Sequential(
  (0): Linear(in_features=576, out_features=128, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=128, out_features=10, bias=True)
)
11:Linear(in_features=576, out_features=128, bias=True)
12:ReLU(inplace=True)
13:Dropout(p=0.5, inplace=False)
14:Linear(in_features=128, out_features=10, bias=True)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48

通过model_modules[index],可获得对应索引的层。

model.named_modules() 就是 带有 layer name 的 model.modules(),也就是它在 model.modules() 的基础上,还返回这些 layer 的名字,返回的每个元素是一个 tuple,tuple 第一个元素是 layer 名称,第二个元素才是 layer 本身。除了在 model 定义时有明确命名的 backbone 和 classifier,其他 layer 都是按照 PyTorch 内部规则自动命名的。

  • 输出model_named_modules

00:('', Net(
  (backbone): S...rue)
  )
))
01:('backbone', Sequential(
  (0): C...e=False)
))
02:('backbone.0', Conv2d(3, 6, kernel_...de=(1, 1)))
03:('backbone.1', ReLU(inplace=True))
04:('backbone.2', BatchNorm2d(6, eps=1...tats=True))
05:('backbone.3', MaxPool2d(kernel_siz...ode=False))
06:('backbone.4', Conv2d(6, 12, kernel...de=(1, 1)))
07:('backbone.5', ReLU(inplace=True))
08:('backbone.6', BatchNorm2d(12, eps=...tats=True))
09:('backbone.7', MaxPool2d(kernel_siz...ode=False))
10:('classifier', Sequential(
  (0): L...as=True)
))
11:('classifier.0', Linear(in_features=5...bias=True))
12:('classifier.1', ReLU(inplace=True))
13:('classifier.2', Dropout(p=0.5, inplace=False))
14:('classifier.3', Linear(in_features=1...bias=True))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

model.named_modules()每一个元素为tuple,存放name和layer

基于 model.modules() 和 model.named_modules() 方法,都能够修改特定的层
1)基于 model.modules(),可使用 isinstance() 函数挑选特定层进行处理:

for layer in model.modules():
	if isinstance(layer, nn.Conv2d):
		<对layer进行处理>
  • 1
  • 2
  • 3

2)基于 model.named_modules(),如果在模型定义时给每个 layer 定义了 name,比如卷积层都是 conv1,conv2…,就可以这样处理:

for name, layer in model.named_modules():
	if 'conv' in name:
		<对layer进行处理>
  • 1
  • 2
  • 3

model.children() 和 model.named_children()

前面说过,Net 可以分为三级,分别是 1)Net,2)Net 的子网络层 backbone/classifier,3)backbone/classifier 的子网络层 conv、relu、batchnorm 等。

model.modules() 会遍历 model 的所有子层,也包括所有子层的子层。举个不严谨的例子,就是会遍历树形结构从 root 到 leaf 的所有节点。在上面的例子里,会遍历三级结构的每一个元素。

model.children() 只会获取 model 第二层 网络结构,比如在上面的例子里,只会获取 backbone 和 classifier,既没有 Net,也没有 backbone/classifier 的子层。model.named_children() 和前面同理,就是带有 layer name 的 model.children()。

model_children = [m for m in model.children()]         
model_named_children = [m for m in model.named_children()] 

len(model_children)              #2   backbone,classifier
len(model_named_children)		 #2	  backbone,classifier
  • 1
  • 2
  • 3
  • 4
  • 5
  • 输出model_children
print(model_children)
  • 1
[Sequential(
  (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (1): ReLU(inplace=True)
  (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
  (5): ReLU(inplace=True)
  (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
),
 Sequential(
  (0): Linear(in_features=576, out_features=128, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=128, out_features=10, bias=True)
)]
>> 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 输出model_named_children
print(model_named_children)
  • 1
>> [('backbone', Sequential(
  (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (1): ReLU(inplace=True)
  (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
  (5): ReLU(inplace=True)
  (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)), 
('classifier', Sequential(
  (0): Linear(in_features=576, out_features=128, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=128, out_features=10, bias=True)
))]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

model.parameters() 和 model.named_parameters()

model.parameters() 迭代地返回 模型所有可学习参数,有些 layer 不含有可学习参数(比如 relu、maxpool),因此 model.parameters() 不会输出这些层。

相应地,model.named_parameters() 就是带有 layer name 的 model.parameters(),每个 tuple 打包了两个元素,分别是 layer name 和 layer param。layer name 的后缀 .weight 和 .bias 用于区分权重和偏置

  • 输出 model_parameters

00:Parameter containing:
tensor([[[[-0.1244, -0.0175,  0.1330],
          [-0.0457,  0.0432, -0.0369],
          [-0.1080,  0.1810, -0.1372]],

         [[ 0.1220,  0.0311, -0.1114],
          [-0.0179,  0.1289, -0.1665],
          [ 0.0750, -0.0238, -0.1796]],

         [[ 0.0942, -0.0327,  0.1228],
          [-0.0715, -0.0432, -0.1395],
          [ 0.1167, -0.0147, -0.0270]]],


        [[[-0.1637,  0.1650, -0.0620],
          [ 0.0173, -0.0489, -0.1267],
          [-0.0762,  0.0789,  0.1761]],

         [[-0.1469,  0.1848, -0.1336],
          [ 0.1651, -0.1031, -0.0063],
          [-0.0485,  0.0062, -0.0858]],

         [[-0.0206,  0.1082, -0.0061],
          [ 0.1236, -0.1520, -0.0234],
          [ 0.1549,  0.0840, -0.1836]]],


        [[[-0.1377,  0.0450, -0.1349],
          [-0.1002,  0.0527, -0.0972],
          [-0.1579,  0.0919,  0.1398]],

         [[ 0.0852, -0.1086, -0.1840],
          [ 0.0049,  0.0691,  0.1165],
          [-0.0722, -0.1022,  0.0629]],

         [[-0.0786, -0.1673, -0.1123],
       ...
01:Parameter containing:
tensor([ 0.0838, -0.0134,  0.0715, -0.1095,  0.0014,  0.0381],
       requires_grad=True)
02:Parameter containing:
tensor([1., 1., 1., 1., 1., 1.], requires_grad=True)
03:Parameter containing:
tensor([0., 0., 0., 0., 0., 0.], requires_grad=True)
04:Parameter containing:
tensor([[[[ 0.1197,  0.0643,  0.1191],
          [ 0.0363,  0.1028, -0.0531],
          [ 0.1358, -0.1207, -0.0255]],

         [[-0.0466,  0.0028,  0.0780],
          [-0.0791,  0.0205,  0.0181],
          [-0.0528, -0.0011, -0.0220]],

         [[ 0.1345, -0.0772,  0.0077],
          [-0.1040,  0.0595, -0.0817],
          [-0.0916, -0.0624, -0.1254]],

         [[ 0.0998, -0.1021, -0.1150],
          [-0.0977, -0.0709, -0.0871],
          [-0.0107, -0.0997,  0.0583]],

         [[ 0.0486,  0.1309,  0.0363],
          [-0.0740,  0.0377,  0.0819],
          [ 0.0573,  0.0259,  0.0986]],

         [[-0.1322,  0.0313,  0.0078],
          [-0.0747,  0.1320,  0.0513],
          [ 0.1026, -0.0181,  0.0632]]],


        [[[-0.0956, -0.0236, -0.0942],
          [-0.0468, -0.0420, -0.0762],
          [ 0.0067, -0.1166,  0.1345]],

         [[-0.0682, -0.0900, -0.0113],
          [-0.0375,  0.1338,  0.0583],
          [-0.0046,  0.0325, -0.0510]],

         [[-0.0679, -0.1240, -0.0277],
         ...
05:Parameter containing:
tensor([ 0.0406,  0.0845,  0.0837,  0.0469,  0.0739,  0.0285,  0.1000, -0.0235,
        -0.0329, -0.0202, -0.1350,  0.0971], requires_grad=True)
06:Parameter containing:
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True)
07:Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)
08:Parameter containing:
tensor([[-0.0228, -0.0244,  0.0078,  ...,  0.0054, -0.0198, -0.0202],
        [ 0.0413, -0.0332,  0.0045,  ..., -0.0026, -0.0398,  0.0195],
        [ 0.0372,  0.0036, -0.0085,  ...,  0.0187,  0.0397,  0.0414],
        ...,
        [-0.0279, -0.0212, -0.0143,  ..., -0.0033, -0.0319, -0.0368],
        [ 0.0388, -0.0378, -0.0170,  ...,  0.0089,  0.0411, -0.0220],
        [ 0.0237, -0.0183,  0.0281,  ..., -0.0131,  0.0266, -0.0002]],
       requires_grad=True)
09:Parameter containing:
tensor([-1.1228e-02, -4.0427e-02, -3.4926e-02, -5.9942e-03,  1.4434e-02,
        -3.2394e-02,  7.4684e-03,  2.2023e-02, -2.5365e-02,  3.0215e-03,
        -2.5406e-02,  2.0588e-02, -2.4900e-02,  1.1824e-02, -1.7249e-02,
        -4.0224e-03, -3.6739e-02,  1.3541e-02, -1.2611e-02, -3.9361e-02,
        -7.9021e-03,  2.3295e-02, -3.4035e-02, -9.1412e-03,  2.0373e-02,
         8.0213e-03,  2.9896e-02, -3.7164e-02, -3.7529e-02,  2.6217e-02,
         1.8121e-02,  1.5182e-02, -6.6703e-03,  2.6726e-02,  1.2208e-02,
         2.6780e-02,  2.4205e-02, -3.9448e-02,  7.5433e-03, -3.4596e-02,
        -9.0037e-03,  2.4151e-02, -2.2760e-02, -3.0653e-02, -2.9417e-02,
         9.5226e-03,  1.9289e-03,  2.2931e-02,  6.0337e-03, -3.1319e-02,
         4.1514e-02, -3.2252e-02,  2.7550e-02,  6.5597e-03, -8.9285e-04,
        -5.5861e-03,  4.8723e-03,  5.2264e-03, -2.7704e-02,  2.9970e-02,
         4.0955e-02,  2.5919e-02, -2.4109e-02, -1.9081e-02,  1.5676e-02,
        -1.5039e-02,  3.6761e-02, -2.6627e-02, -2.024...
10:Parameter containing:
tensor([[ 0.0803, -0.0856,  0.0390,  ..., -0.0727,  0.0265,  0.0627],
        [-0.0042, -0.0816,  0.0012,  ..., -0.0799, -0.0748,  0.0160],
        [-0.0106,  0.0765, -0.0407,  ...,  0.0443, -0.0450,  0.0070],
        ...,
        [-0.0768, -0.0706,  0.0364,  ..., -0.0053,  0.0844, -0.0749],
        [ 0.0354, -0.0771,  0.0592,  ...,  0.0094, -0.0855, -0.0805],
        [ 0.0554,  0.0158, -0.0131,  ..., -0.0503, -0.0812, -0.0749]],
       requires_grad=True)
11:Parameter containing:
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 输出 model_named_parameters
    这里的数据和上面完全相同,简洁起见只print所有层的name
for k, v in model.named_parameters():
	print(k)
  • 1
  • 2
>> 
backbone.0.weight
backbone.0.bias
backbone.2.weight
backbone.2.bias
backbone.4.weight
backbone.4.bias
backbone.6.weight
backbone.6.bias
classifier.0.weight
classifier.0.bias
classifier.3.weight
classifier.3.bias
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

其中backbone.1 backbone.3,classifier1,classifer.2 分别对应为 Relu,Maxpool2d,Relu,Droput层,这些层没有可学习的参数,因此model.named_parameters对这两层的参数不显示。对于backbone.2 和 backbone.6对应于BN层有学习参数weight和bias,则会显示出来

model.state_dict()

model.state_dict() 能够获取 模型中的所有参数,包括可学习参数不可学习参数,其返回值是一个有序字典 OrderedDict

从例子中可以看出,model.state_dict() 获取了 model 中所有的可学习参数(weight、bias),同时还获取了不可学习参数(BN layer 的 running mean running var 等)。可以将 model.state_dict() 看作是在 model.parameters() 功能的基础上,又额外获取了所有不可学习参数

OrderedDict([('backbone.0.weight',
              tensor([[[[ 0.1796,  0.0621,  0.1027],
                        [-0.0723, -0.0971,  0.0218],
                        [-0.0835, -0.0479,  0.0305]],
                        ...
                       [[-0.0544, -0.1858,  0.1559],
                        [-0.0589,  0.0146, -0.1285],
                        [-0.1033,  0.0743,  0.1137]]]])),
             ('backbone.0.bias',
              tensor([ 0.0202,  0.1326,  0.0124, -0.1895, -0.1094, -0.1045])),
             ('backbone.2.weight', tensor([1., 1., 1., 1., 1., 1.])),
             ('backbone.2.bias', tensor([0., 0., 0., 0., 0., 0.])),
             ('backbone.2.running_mean', tensor([0., 0., 0., 0., 0., 0.])),
             ('backbone.2.running_var', tensor([1., 1., 1., 1., 1., 1.])),
             ('backbone.2.num_batches_tracked', tensor(0)),
             ('backbone.4.weight',
              tensor([[[[ 1.3451e-01, -7.3591e-02, -1.0690e-01],
                        [-5.4909e-02, -3.3993e-02,  3.3203e-02],
                        [-6.4427e-02,  1.2523e-01, -3.7897e-02]],
                        ...
                       [[-1.0125e-01,  1.7249e-02, -6.3623e-02],
                        [ 4.0353e-02, -7.0894e-02,  6.0606e-03],
                        [ 6.2089e-02,  8.5485e-02,  1.0689e-01]]]])),
             ('backbone.4.bias',
              tensor([ 0.0999, -0.1271,  0.0010,  0.1151, -0.1221,  0.0144,  0.1088,  0.1214,
                      -0.0175, -0.1071,  0.0937, -0.0058])),
             ('backbone.6.weight',
              tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])),
             ('backbone.6.bias',
              tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])),
             ('backbone.6.running_mean',
              tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])),
             ('backbone.6.running_var',
              tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])),
             ('backbone.6.num_batches_tracked', tensor(0)),
             ('classifier.0.weight',
              tensor([[ 0.0359,  0.0245,  0.0020,  ...,  0.0282, -0.0255, -0.0319],
                      [ 0.0020,  0.0196,  0.0011,  ..., -0.0412,  0.0179,  0.0288],
                      [ 0.0251, -0.0245,  0.0152,  ...,  0.0136,  0.0084, -0.0052],
                      ...,
                      [ 0.0235, -0.0100, -0.0348,  ...,  0.0160, -0.0249, -0.0007],
                      [-0.0385,  0.0202, -0.0359,  ...,  0.0367,  0.0155, -0.0367],
                      [ 0.0092,  0.0375, -0.0229,  ..., -0.0322, -0.0065,  0.0008]])),
             ('classifier.0.bias',
              tensor([ 3.7528e-02, -2.4906e-02, -3.0417e-02, -2.9277e-02,  3.8544e-02,
                      ...
                      -1.4599e-02,  3.6207e-02,  1.8414e-02])),
             ('classifier.3.weight',
              tensor([[-0.0793, -0.0080,  0.0755,  ...,  0.0225,  0.0632,  0.0223],
                      [-0.0861, -0.0295,  0.0301,  ..., -0.0664, -0.0458,  0.0044],
                      [-0.0646,  0.0225, -0.0640,  ..., -0.0004,  0.0289, -0.0165],
                      ...,
                      [-0.0760, -0.0517, -0.0625,  ...,  0.0393, -0.0475, -0.0070],
                      [ 0.0558, -0.0860, -0.0813,  ..., -0.0578, -0.0843, -0.0303],
                      [-0.0077,  0.0227,  0.0247,  ..., -0.0424,  0.0134, -0.0196]])),
             ('classifier.3.bias',
              tensor([-0.0307,  0.0848,  0.0686,  0.0819,  0.0455,  0.0711,  0.0073,  0.0117,
                       0.0293,  0.0431]))])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58

model.state_dict() 和 model.parameters() 的区别

1. 返回值类型不同
model.parameters() 返回的是一个生成器 generator object,而 model.state_dict() 返回的是有序列表 OrderedDict。

model.parameters()                                                     
>>> <generator object Module.parameters at 0x7fb381953f90>

model.state_dict()                                                     
>>> 
OrderedDict([('backbone.0.weight', tensor([[[[ 0.1200, -0.1627, -0.0841],
                        [-0.1369, -0.1525,  0.0541],
                        [ 0.1203,  0.0564,  0.0908]],
                        ...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

2. 存储的模型参数种类不同
为了直观展示区别,这里使用 model.named_parameters() 与 model.state_dict() 做比较:

  • model.named_parameters()获取了模型中所有可学习的参数
  • 而 model.state_dict() 在 model.parameters() 功能的基础上,又额外获取了所有不可学习参数(BN layer 的 running mean 和 running var 等)。
model_state_dict = model.state_dict()
model_named_parameters = model.named_parameters()

for k,v in model_named_parameters:
    print(k)

for k in model_state_dict:
    print(k)

###################################
## output model_named_parameters ##
###################################
backbone.0.weight
backbone.0.bias
backbone.2.weight
backbone.2.bias
backbone.4.weight
backbone.4.bias
backbone.6.weight
backbone.6.bias
classifier.0.weight
classifier.0.bias
classifier.3.weight
classifier.3.bias

#############################
## output model_state_dict ##
#############################
backbone.0.weight
backbone.0.bias
backbone.2.weight
backbone.2.bias
backbone.2.running_mean
backbone.2.running_var
backbone.2.num_batches_tracked
backbone.4.weight
backbone.4.bias
backbone.6.weight
backbone.6.bias
backbone.6.running_mean
backbone.6.running_var
backbone.6.num_batches_tracked
classifier.0.weight
classifier.0.bias
classifier.3.weight
classifier.3.bias
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/103492
推荐阅读
相关标签
  

闽ICP备14008679号