赞
踩
本博客,参考与:https://blog.csdn.net/qq_31347869/article/details/125065271
自定义一个分类网络,主干网络backbone包含Conv,Bn,Relu,Maxpooling,分类网络:2层全连接网络:
import torch import torch.nn as nn class Net(nn.Module): def __init__(self, num_class=10): super().__init__() self.backbone = nn.Sequential( nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3), nn.ReLU(inplace=True), nn.BatchNorm2d(6), nn.MaxPool2d(kernel_size=2, stride=2), nn.Conv2d(in_channels=6, out_channels=12, kernel_size=3), nn.ReLU(inplace=True), nn.BatchNorm2d(12), nn.MaxPool2d(kernel_size=2, stride=2) ) self.classifier = nn.Sequential( nn.Linear(9*8*8, 128), nn.ReLU(inplace=True), nn.Dropout(), nn.Linear(128, num_class) ) def forward(self, x): output = self.backbone(x) output = output.view(output.size()[0], -1) output = self.classifier(output) return output model = Net()
网络 Net 本身是一个 nn.Module
的子类,包含了 backbone
和 classifier
两个由 Sequential 容器组成的 nn.Module 子类,backbone 和 classifier 各自又包含一些网络层,这些网络层也都属于 nn.Module 子类,所以从外到内共有三级:
Net
(nn.Module子类)backbone
和 classifier(Sequential,nn.Module子类
),是 Net 的子网络层网络层
如 conv,relu,batchnorm 等(nn.Module子类),是 backbone 或 classifier 的子网络层model.modules() >> <generator object Module.modules at 0x7f3aff9f1740> model.named_modules() >> <generator object Module.named_modules at 0x7f3aff95bf20> model.children() >> <generator object Module.children at 0x7f3aff9f1660> model.parameters() >> <generator object Module.parameters at 0x7f3aff95bf90> model.named_parameters() >> <generator object Module.named_parameters at 0x7f3aff95bba0> model.state_dict() >> OrderedDict([('backbone.0.weight', tensor([[[[-0.1244, ....1406]]]])), ('backbone.0.bias', tensor([ 0.0838, -0.... 0.0381])), ('backbone.2.weight', tensor([1., 1., 1., ..., 1., 1.])), ('backbone.2.bias', tensor([0., 0., 0., ..., 0., 0.])), ('backbone.2.running_mean', tensor([0., 0., 0., ..., 0., 0.])), ('backbone.2.running_var', tensor([1., 1., 1., ..., 1., 1.])), ('backbone.2.num_batches_tracked', tensor(0)), ('backbone.4.weight', tensor([[[[ 0.1197, ....1024]]]])), ('backbone.4.bias', tensor([ 0.0406, 0.... 0.0971])), ('backbone.6.weight', tensor([1., 1., 1., ..., 1., 1.])), ('backbone.6.bias', tensor([0., 0., 0., ..., 0., 0.])), ('backbone.6.running_mean', tensor([0., 0., 0., ..., 0., 0.])), ('backbone.6.running_var', tensor([1., 1., 1., ..., 1., 1.])), ('backbone.6.num_batches_tracked', tensor(0)), ...])
除了 model.state_dict()
返回值为一个有序字典OrderedDict,其他方法的返回值都是一个生成器generator,通过 for 循环将内容保存在一个列表里:
model_modules = [m for m in model.modules()]
model_named_modules = [m for m in model.named_modules()]
model_children = [m for m in model.children()]
model_named_children = [m for m in model.named_children()]
model_parameters = [m for m in model.parameters()]
model_named_parameters = [m for m in model.named_parameters()]
model.modules() 迭代遍历模型的所有子层,子层是指继承了 nn.Module 类的层。
定义的网络模型 Net 中,Net() 本身,backbone()、classifier() 以及二者包含的所有的 layer 都继承了 nn.Module 类,因此会被迭代遍历,且遍历方式符合深度优先遍历。比如对 Net 使用 .modules() 方法,会按照如下顺序遍历:Net --> backbone --> backbone layer --> classifier --> classifier layer
。
model_modules = [m for m in model.modules()]
model_named_modules = [m for m in model.named_modules()]
len(model_modules) # 15
len(model_named_modules) # 15
输出model_modules
model_modules >> 00: Net( (backbone): Sequential( (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU(inplace=True) (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU(inplace=True) (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (classifier): Sequential( (0): Linear(in_features=576, out_features=128, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=128, out_features=10, bias=True) ) ) 01:Sequential( (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU(inplace=True) (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU(inplace=True) (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) 02:Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1)) 03:ReLU(inplace=True) 04:BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 05:MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) 06:Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1)) 07:ReLU(inplace=True) 08:BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 09:MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) 10:Sequential( (0): Linear(in_features=576, out_features=128, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=128, out_features=10, bias=True) ) 11:Linear(in_features=576, out_features=128, bias=True) 12:ReLU(inplace=True) 13:Dropout(p=0.5, inplace=False) 14:Linear(in_features=128, out_features=10, bias=True)
通过model_modules[index]
,可获得对应索引的层。
而 model.named_modules()
就是 带有 layer name 的 model.modules(),也就是它在 model.modules() 的基础上,还返回这些 layer 的名字,返回的每个元素是一个 tuple
,tuple 第一个元素是 layer 名称,第二个元素才是 layer 本身。除了在 model 定义时有明确命名的 backbone 和 classifier,其他 layer 都是按照 PyTorch 内部规则自动命名的。
输出model_named_modules
00:('', Net( (backbone): S...rue) ) )) 01:('backbone', Sequential( (0): C...e=False) )) 02:('backbone.0', Conv2d(3, 6, kernel_...de=(1, 1))) 03:('backbone.1', ReLU(inplace=True)) 04:('backbone.2', BatchNorm2d(6, eps=1...tats=True)) 05:('backbone.3', MaxPool2d(kernel_siz...ode=False)) 06:('backbone.4', Conv2d(6, 12, kernel...de=(1, 1))) 07:('backbone.5', ReLU(inplace=True)) 08:('backbone.6', BatchNorm2d(12, eps=...tats=True)) 09:('backbone.7', MaxPool2d(kernel_siz...ode=False)) 10:('classifier', Sequential( (0): L...as=True) )) 11:('classifier.0', Linear(in_features=5...bias=True)) 12:('classifier.1', ReLU(inplace=True)) 13:('classifier.2', Dropout(p=0.5, inplace=False)) 14:('classifier.3', Linear(in_features=1...bias=True))
model.named_modules()
每一个元素为tuple,存放name和layer
基于 model.modules() 和 model.named_modules() 方法,都能够修改特定的层。
1)基于 model.modules(),可使用 isinstance() 函数挑选特定层进行处理:
for layer in model.modules():
if isinstance(layer, nn.Conv2d):
<对layer进行处理>
2)基于 model.named_modules(),如果在模型定义时给每个 layer 定义了 name,比如卷积层都是 conv1,conv2…,就可以这样处理:
for name, layer in model.named_modules():
if 'conv' in name:
<对layer进行处理>
前面说过,Net 可以分为三级,分别是 1)Net,2)Net 的子网络层 backbone/classifier,3)backbone/classifier 的子网络层 conv、relu、batchnorm 等。
model.modules() 会遍历 model 的所有子层,也包括所有子层的子层。举个不严谨的例子,就是会遍历树形结构从 root 到 leaf 的所有节点。在上面的例子里,会遍历三级结构的每一个元素。
model.children() 只会获取 model 第二层 网络结构
,比如在上面的例子里,只会获取 backbone 和 classifier,既没有 Net,也没有 backbone/classifier 的子层。model.named_children() 和前面同理,就是带有 layer name 的 model.children()。
model_children = [m for m in model.children()]
model_named_children = [m for m in model.named_children()]
len(model_children) #2 backbone,classifier
len(model_named_children) #2 backbone,classifier
输出model_children
print(model_children)
[Sequential( (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU(inplace=True) (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU(inplace=True) (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ), Sequential( (0): Linear(in_features=576, out_features=128, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=128, out_features=10, bias=True) )] >>
输出model_named_children
print(model_named_children)
>> [('backbone', Sequential( (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU(inplace=True) (2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU(inplace=True) (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) )), ('classifier', Sequential( (0): Linear(in_features=576, out_features=128, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=128, out_features=10, bias=True) ))]
model.parameters()
迭代地返回 模型所有可学习参数
,有些 layer 不含有可学习参数(比如 relu、maxpool),因此 model.parameters() 不会输出这些层。
相应地,model.named_parameters()
就是带有 layer name 的 model.parameters(),每个 tuple
打包了两个元素,分别是 layer name 和 layer param。layer name 的后缀 .weight 和 .bias 用于区分权重和偏置
00:Parameter containing: tensor([[[[-0.1244, -0.0175, 0.1330], [-0.0457, 0.0432, -0.0369], [-0.1080, 0.1810, -0.1372]], [[ 0.1220, 0.0311, -0.1114], [-0.0179, 0.1289, -0.1665], [ 0.0750, -0.0238, -0.1796]], [[ 0.0942, -0.0327, 0.1228], [-0.0715, -0.0432, -0.1395], [ 0.1167, -0.0147, -0.0270]]], [[[-0.1637, 0.1650, -0.0620], [ 0.0173, -0.0489, -0.1267], [-0.0762, 0.0789, 0.1761]], [[-0.1469, 0.1848, -0.1336], [ 0.1651, -0.1031, -0.0063], [-0.0485, 0.0062, -0.0858]], [[-0.0206, 0.1082, -0.0061], [ 0.1236, -0.1520, -0.0234], [ 0.1549, 0.0840, -0.1836]]], [[[-0.1377, 0.0450, -0.1349], [-0.1002, 0.0527, -0.0972], [-0.1579, 0.0919, 0.1398]], [[ 0.0852, -0.1086, -0.1840], [ 0.0049, 0.0691, 0.1165], [-0.0722, -0.1022, 0.0629]], [[-0.0786, -0.1673, -0.1123], ... 01:Parameter containing: tensor([ 0.0838, -0.0134, 0.0715, -0.1095, 0.0014, 0.0381], requires_grad=True) 02:Parameter containing: tensor([1., 1., 1., 1., 1., 1.], requires_grad=True) 03:Parameter containing: tensor([0., 0., 0., 0., 0., 0.], requires_grad=True) 04:Parameter containing: tensor([[[[ 0.1197, 0.0643, 0.1191], [ 0.0363, 0.1028, -0.0531], [ 0.1358, -0.1207, -0.0255]], [[-0.0466, 0.0028, 0.0780], [-0.0791, 0.0205, 0.0181], [-0.0528, -0.0011, -0.0220]], [[ 0.1345, -0.0772, 0.0077], [-0.1040, 0.0595, -0.0817], [-0.0916, -0.0624, -0.1254]], [[ 0.0998, -0.1021, -0.1150], [-0.0977, -0.0709, -0.0871], [-0.0107, -0.0997, 0.0583]], [[ 0.0486, 0.1309, 0.0363], [-0.0740, 0.0377, 0.0819], [ 0.0573, 0.0259, 0.0986]], [[-0.1322, 0.0313, 0.0078], [-0.0747, 0.1320, 0.0513], [ 0.1026, -0.0181, 0.0632]]], [[[-0.0956, -0.0236, -0.0942], [-0.0468, -0.0420, -0.0762], [ 0.0067, -0.1166, 0.1345]], [[-0.0682, -0.0900, -0.0113], [-0.0375, 0.1338, 0.0583], [-0.0046, 0.0325, -0.0510]], [[-0.0679, -0.1240, -0.0277], ... 05:Parameter containing: tensor([ 0.0406, 0.0845, 0.0837, 0.0469, 0.0739, 0.0285, 0.1000, -0.0235, -0.0329, -0.0202, -0.1350, 0.0971], requires_grad=True) 06:Parameter containing: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True) 07:Parameter containing: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True) 08:Parameter containing: tensor([[-0.0228, -0.0244, 0.0078, ..., 0.0054, -0.0198, -0.0202], [ 0.0413, -0.0332, 0.0045, ..., -0.0026, -0.0398, 0.0195], [ 0.0372, 0.0036, -0.0085, ..., 0.0187, 0.0397, 0.0414], ..., [-0.0279, -0.0212, -0.0143, ..., -0.0033, -0.0319, -0.0368], [ 0.0388, -0.0378, -0.0170, ..., 0.0089, 0.0411, -0.0220], [ 0.0237, -0.0183, 0.0281, ..., -0.0131, 0.0266, -0.0002]], requires_grad=True) 09:Parameter containing: tensor([-1.1228e-02, -4.0427e-02, -3.4926e-02, -5.9942e-03, 1.4434e-02, -3.2394e-02, 7.4684e-03, 2.2023e-02, -2.5365e-02, 3.0215e-03, -2.5406e-02, 2.0588e-02, -2.4900e-02, 1.1824e-02, -1.7249e-02, -4.0224e-03, -3.6739e-02, 1.3541e-02, -1.2611e-02, -3.9361e-02, -7.9021e-03, 2.3295e-02, -3.4035e-02, -9.1412e-03, 2.0373e-02, 8.0213e-03, 2.9896e-02, -3.7164e-02, -3.7529e-02, 2.6217e-02, 1.8121e-02, 1.5182e-02, -6.6703e-03, 2.6726e-02, 1.2208e-02, 2.6780e-02, 2.4205e-02, -3.9448e-02, 7.5433e-03, -3.4596e-02, -9.0037e-03, 2.4151e-02, -2.2760e-02, -3.0653e-02, -2.9417e-02, 9.5226e-03, 1.9289e-03, 2.2931e-02, 6.0337e-03, -3.1319e-02, 4.1514e-02, -3.2252e-02, 2.7550e-02, 6.5597e-03, -8.9285e-04, -5.5861e-03, 4.8723e-03, 5.2264e-03, -2.7704e-02, 2.9970e-02, 4.0955e-02, 2.5919e-02, -2.4109e-02, -1.9081e-02, 1.5676e-02, -1.5039e-02, 3.6761e-02, -2.6627e-02, -2.024... 10:Parameter containing: tensor([[ 0.0803, -0.0856, 0.0390, ..., -0.0727, 0.0265, 0.0627], [-0.0042, -0.0816, 0.0012, ..., -0.0799, -0.0748, 0.0160], [-0.0106, 0.0765, -0.0407, ..., 0.0443, -0.0450, 0.0070], ..., [-0.0768, -0.0706, 0.0364, ..., -0.0053, 0.0844, -0.0749], [ 0.0354, -0.0771, 0.0592, ..., 0.0094, -0.0855, -0.0805], [ 0.0554, 0.0158, -0.0131, ..., -0.0503, -0.0812, -0.0749]], requires_grad=True) 11:Parameter containing:
for k, v in model.named_parameters():
print(k)
>>
backbone.0.weight
backbone.0.bias
backbone.2.weight
backbone.2.bias
backbone.4.weight
backbone.4.bias
backbone.6.weight
backbone.6.bias
classifier.0.weight
classifier.0.bias
classifier.3.weight
classifier.3.bias
其中backbone.1
和 backbone.3,classifier1,classifer.2
分别对应为 Relu,Maxpool2d,Relu,Droput
层,这些层没有可学习的参数,因此model.named_parameters对这两层的参数不显示。对于backbone.2 和 backbone.6对应于BN层有学习参数weight和bias,则会显示出来
model.state_dict() 能够获取 模型中的所有参数
,包括可学习参数
和不可学习参数
,其返回值是一个有序字典 OrderedDict
。
从例子中可以看出,model.state_dict() 获取了 model 中所有的可学习参数(weight、bias),同时还获取了不可学习
参数(BN layer 的 running mean
和 running var
等)。可以将 model.state_dict() 看作是在 model.parameters() 功能的基础上,又额外获取了所有不可学习参数
。
OrderedDict([('backbone.0.weight', tensor([[[[ 0.1796, 0.0621, 0.1027], [-0.0723, -0.0971, 0.0218], [-0.0835, -0.0479, 0.0305]], ... [[-0.0544, -0.1858, 0.1559], [-0.0589, 0.0146, -0.1285], [-0.1033, 0.0743, 0.1137]]]])), ('backbone.0.bias', tensor([ 0.0202, 0.1326, 0.0124, -0.1895, -0.1094, -0.1045])), ('backbone.2.weight', tensor([1., 1., 1., 1., 1., 1.])), ('backbone.2.bias', tensor([0., 0., 0., 0., 0., 0.])), ('backbone.2.running_mean', tensor([0., 0., 0., 0., 0., 0.])), ('backbone.2.running_var', tensor([1., 1., 1., 1., 1., 1.])), ('backbone.2.num_batches_tracked', tensor(0)), ('backbone.4.weight', tensor([[[[ 1.3451e-01, -7.3591e-02, -1.0690e-01], [-5.4909e-02, -3.3993e-02, 3.3203e-02], [-6.4427e-02, 1.2523e-01, -3.7897e-02]], ... [[-1.0125e-01, 1.7249e-02, -6.3623e-02], [ 4.0353e-02, -7.0894e-02, 6.0606e-03], [ 6.2089e-02, 8.5485e-02, 1.0689e-01]]]])), ('backbone.4.bias', tensor([ 0.0999, -0.1271, 0.0010, 0.1151, -0.1221, 0.0144, 0.1088, 0.1214, -0.0175, -0.1071, 0.0937, -0.0058])), ('backbone.6.weight', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])), ('backbone.6.bias', tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])), ('backbone.6.running_mean', tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])), ('backbone.6.running_var', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])), ('backbone.6.num_batches_tracked', tensor(0)), ('classifier.0.weight', tensor([[ 0.0359, 0.0245, 0.0020, ..., 0.0282, -0.0255, -0.0319], [ 0.0020, 0.0196, 0.0011, ..., -0.0412, 0.0179, 0.0288], [ 0.0251, -0.0245, 0.0152, ..., 0.0136, 0.0084, -0.0052], ..., [ 0.0235, -0.0100, -0.0348, ..., 0.0160, -0.0249, -0.0007], [-0.0385, 0.0202, -0.0359, ..., 0.0367, 0.0155, -0.0367], [ 0.0092, 0.0375, -0.0229, ..., -0.0322, -0.0065, 0.0008]])), ('classifier.0.bias', tensor([ 3.7528e-02, -2.4906e-02, -3.0417e-02, -2.9277e-02, 3.8544e-02, ... -1.4599e-02, 3.6207e-02, 1.8414e-02])), ('classifier.3.weight', tensor([[-0.0793, -0.0080, 0.0755, ..., 0.0225, 0.0632, 0.0223], [-0.0861, -0.0295, 0.0301, ..., -0.0664, -0.0458, 0.0044], [-0.0646, 0.0225, -0.0640, ..., -0.0004, 0.0289, -0.0165], ..., [-0.0760, -0.0517, -0.0625, ..., 0.0393, -0.0475, -0.0070], [ 0.0558, -0.0860, -0.0813, ..., -0.0578, -0.0843, -0.0303], [-0.0077, 0.0227, 0.0247, ..., -0.0424, 0.0134, -0.0196]])), ('classifier.3.bias', tensor([-0.0307, 0.0848, 0.0686, 0.0819, 0.0455, 0.0711, 0.0073, 0.0117, 0.0293, 0.0431]))])
1. 返回值类型不同
model.parameters() 返回的是一个生成器 generator object,而 model.state_dict() 返回的是有序列表 OrderedDict。
model.parameters()
>>> <generator object Module.parameters at 0x7fb381953f90>
model.state_dict()
>>>
OrderedDict([('backbone.0.weight', tensor([[[[ 0.1200, -0.1627, -0.0841],
[-0.1369, -0.1525, 0.0541],
[ 0.1203, 0.0564, 0.0908]],
...
2. 存储的模型参数种类不同
为了直观展示区别,这里使用 model.named_parameters() 与 model.state_dict() 做比较:
model_state_dict = model.state_dict() model_named_parameters = model.named_parameters() for k,v in model_named_parameters: print(k) for k in model_state_dict: print(k) ################################### ## output model_named_parameters ## ################################### backbone.0.weight backbone.0.bias backbone.2.weight backbone.2.bias backbone.4.weight backbone.4.bias backbone.6.weight backbone.6.bias classifier.0.weight classifier.0.bias classifier.3.weight classifier.3.bias ############################# ## output model_state_dict ## ############################# backbone.0.weight backbone.0.bias backbone.2.weight backbone.2.bias backbone.2.running_mean backbone.2.running_var backbone.2.num_batches_tracked backbone.4.weight backbone.4.bias backbone.6.weight backbone.6.bias backbone.6.running_mean backbone.6.running_var backbone.6.num_batches_tracked classifier.0.weight classifier.0.bias classifier.3.weight classifier.3.bias
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。