当前位置:   article > 正文

【PyTorch】微调修改预训练模型权重参数_pytorch怎么只加载权重不加载模型结构

pytorch怎么只加载权重不加载模型结构

介绍

在使用预训练模型微调训练时,我们通常需要根据实际的数据集以及想要达到的效果,修改预训练模型的结构。查阅了其他博客和torch.nn源码后,做个笔记。
为了更方便的了解,将使用torchvision中的模型convnext作介绍。

一、获取模型参数

1、直接print模型

import torch
import torchvision.models as models
import torch.nn as nn

model = models.convnext_tiny(pretrained = False)
print(model)

######################输出的模型结果###################
ConvNeXt(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
      (1): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
    )
    (1): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0058823529411764705, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.011764705882352941, mode=row)
      )
    )
    (2): Sequential(
      (0): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(96, 192, kernel_size=(2, 2), stride=(2, 2))
    )
    (3): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.017647058823529415, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.023529411764705882, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.029411764705882353, mode=row)
      )
    )
    (4): Sequential(
      (0): LayerNorm2d((192,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(192, 384, kernel_size=(2, 2), stride=(2, 2))
    )
    (5): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.03529411764705883, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.047058823529411764, mode=row)
      )
      (3): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.052941176470588235, mode=row)
      )
      (4): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.058823529411764705, mode=row)
      )
      (5): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.06470588235294118, mode=row)
      )
      (6): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07058823529411766, mode=row)
      )
      (7): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07647058823529412, mode=row)
      )
      (8): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0823529411764706, mode=row)
      )
    )
    (6): Sequential(
      (0): LayerNorm2d((384,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(384, 768, kernel_size=(2, 2), stride=(2, 2))
    )
    (7): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.08823529411764706, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.09411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.1, mode=row)
      )
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): Sequential(
    (0): LayerNorm2d((768,), eps=1e-06, elementwise_affine=True)
    (1): Flatten(start_dim=1, end_dim=-1)
    (2): Linear(in_features=768, out_features=1000, bias=True)
  )
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258

2、model.state_dict()

pytorch 中的 state_dict 是一个简单的python的字典对象,将每一层与它的对应参数建立映射关系.(如model的每一层的weights及偏置等等)。这个方法的作用一方面是方便查看某一个层的权值和偏置数据,另一方面更多的是在模型保存的时候使用。

torch.save(model.state_dict(), 'model_weights.pth')			#保存模型的参数以及权重

#使用预训练的模型时
model = models.convnext_tiny(pretrained = False)			#生成相同的模型结构
model.load_state_dict(torch.load('model_weights.pth'))		#将参数权重加载到模型之中
  • 1
  • 2
  • 3
  • 4
  • 5

3、model.parameters()

这个方法也可以获取模型的参数信息,与前面的方法不同的是,model.parameters()方法返回的是一个生成器generator,每一个元素是从开头到结尾的参数,parameters没有对应的key名称,是一个由纯参数组成的generator,而state_dict是一个字典,包含了一个key。

二、修改模型参数

PyTorch中模型参数都是由字典的形式保存,所以当你想要修改模型结构时,直接通过字典的方式调用你想要的结构并且重新定义,就可以修改模型的参数。

model.classifier = nn.Linear(in_features=768, out_features=1000, bias=True)
print(model)

#########################输出的结果如下#####################
ConvNeXt(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
      (1): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
    )
    (1): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0058823529411764705, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.011764705882352941, mode=row)
      )
    )
    (2): Sequential(
      (0): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(96, 192, kernel_size=(2, 2), stride=(2, 2))
    )
    (3): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.017647058823529415, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.023529411764705882, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.029411764705882353, mode=row)
      )
    )
    (4): Sequential(
      (0): LayerNorm2d((192,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(192, 384, kernel_size=(2, 2), stride=(2, 2))
    )
    (5): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.03529411764705883, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.047058823529411764, mode=row)
      )
      (3): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.052941176470588235, mode=row)
      )
      (4): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.058823529411764705, mode=row)
      )
      (5): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.06470588235294118, mode=row)
      )
      (6): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07058823529411766, mode=row)
      )
      (7): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07647058823529412, mode=row)
      )
      (8): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0823529411764706, mode=row)
      )
    )
    (6): Sequential(
      (0): LayerNorm2d((384,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(384, 768, kernel_size=(2, 2), stride=(2, 2))
    )
    (7): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.08823529411764706, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.09411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.1, mode=row)
      )
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): Linear(in_features=768, out_features=1000, bias=True)
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250

但是,我们可以看到,当我们直接修改classifier里面的结构,他会将整个classifier都重新定义为你的输入的样子,那么当你只是要修改最后的分类层的话,你就只能重新去定义一整个sequential,并且在重新定义的时候,如果你已经加载了预训练的参数,预训练模型的参数就会丢失,这样就会非常麻烦。

那如何只修改classifier中最后一个线性层呢,我在网络上查找了这个内容,发现比较难找到相关的内容,大家都只能通过访问模型拥有key的部分的结构,对于sequential内部的结构,有人尝试用model.classifier.0去访问,这并不符合python的语法结构。所以我又去查询了nn.sequential的官方文档。

    def _get_item_by_idx(self, iterator, idx) -> T:
        """Get the idx-th item of the iterator"""
        size = len(self)
        idx = operator.index(idx)
        if not -size <= idx < size:
            raise IndexError('index {} is out of range'.format(idx))
        idx %= size
        return next(islice(iterator, idx, None))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

可以看到其中有个迭代器,所以我就尝试了用列表的访问方式去访问,结果证明是可行的。

model.classifier[2] = nn.Linear(in_features=768, out_features=4, bias=True)
print(model)

####################输出结果如下########################
ConvNeXt(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
      (1): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
    )
    (1): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0058823529411764705, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.011764705882352941, mode=row)
      )
    )
    (2): Sequential(
      (0): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(96, 192, kernel_size=(2, 2), stride=(2, 2))
    )
    (3): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.017647058823529415, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.023529411764705882, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.029411764705882353, mode=row)
      )
    )
    (4): Sequential(
      (0): LayerNorm2d((192,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(192, 384, kernel_size=(2, 2), stride=(2, 2))
    )
    (5): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.03529411764705883, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.047058823529411764, mode=row)
      )
      (3): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.052941176470588235, mode=row)
      )
      (4): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.058823529411764705, mode=row)
      )
      (5): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.06470588235294118, mode=row)
      )
      (6): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07058823529411766, mode=row)
      )
      (7): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07647058823529412, mode=row)
      )
      (8): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0823529411764706, mode=row)
      )
    )
    (6): Sequential(
      (0): LayerNorm2d((384,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(384, 768, kernel_size=(2, 2), stride=(2, 2))
    )
    (7): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.08823529411764706, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.09411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate=none)
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.1, mode=row)
      )
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): Sequential(
    (0): LayerNorm2d((768,), eps=1e-06, elementwise_affine=True)
    (1): Flatten(start_dim=1, end_dim=-1)
    (2): Linear(in_features=768, out_features=4, bias=True)
  )
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254

那么我们能够顺利的访问模型的每个结构的话,修改的话也就十分简单了。

使用add_module()方法添加dropout

model.classifier.add_module("add_dropout",nn.Dropout())
print(model)
  • 1
  • 2

参考链接: https://blog.csdn.net/ltochange/article/details/121421776
https://blog.csdn.net/qq_39332551/article/details/124943453

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/727929
推荐阅读
相关标签
  

闽ICP备14008679号