知识蒸馏初级实战--MNIST手写识别_知识蒸馏 mnist

作者：AllinToyou | 2024-06-17 15:32:40

踩

知识蒸馏 mnist

编程环境

Python3.8.8
CUDA10.1
torch1.5

数据集

MNIST 是一个入门级的计算机视觉数据集,它包含各种手写数字图片：
在这里插入图片描述
它也包含每一张图片对应的标签,告诉我们这个是数字几；比如,上面这四张图片的标签分别是 5, 0, 4, 1。数据集包括60000 行的训练数据集 (mnist.train) 和 10000 行的测试数据集 (mnist.test) 。数据集使用torchvision库下载。

教师类

教师模型三层网络，中间层1200个神经元

代码如下（示例）：

class TeacherModel(nn.Module):
    def __init__(self, in_channels=1, num_classes=10):
        super(TeacherModel, self).__init__()
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(784, 1200)
        self.fc2 = nn.Linear(1200, 1200)
        self.fc3 = nn.Linear(1200, num_classes)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        x = x.view(-1, 784)
        x = self.fc1(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc2(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc3(x)

        return x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

学生类

学生模型是一个较小的三层网络，中间层20个神经元

代码如下（示例）：

class StudentModel(nn.Module):
    def __init__(self, in_channels=1, num_classes=10):
        super(TeacherModel, self).__init__()
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(784, 20)
        self.fc2 = nn.Linear(20, 20)
        self.fc3 = nn.Linear(20, num_classes)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        x = x.view(-1, 784)
        x = self.fc1(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc2(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc3(x)

        return x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

教师网络的训练和预测

	torch.manual_seed(0)
	
    device = torch.device("cuda" if torch.cuda.is_available else "cpu")
    
    torch.backends.cudnn.benchmark = True
    
    X_train = torchvision.datasets.MNIST(
    root="dataset/",
    train = True,
    transform = transforms.ToTensor(),
    download = True
    )
    
    X_test = torchvision.datasets.MNIST(
    root="dataset/",
    train = False,
    transform = transforms.ToTensor(),
    download = True
    )
    
    train_loader = DataLoader(dataset=X_train, batch_size=32, shuffle=True)
    test_loader = DataLoader(dataset=X_test, batch_size=32, shuffle=False)

    model = TeacherModel()
    model = model.to(device)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

    epochs = 6
    
    for epoch in range(epochs):
        model.train()

        for data, target in tqdm(train_loader):
            data = data.to(device)
            target = target.to(device)
            preds = model(data)
            loss = criterion(preds, target)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        model.eval()
        num_correct = 0
        num_samples = 0

        with torch.no_grad():
            for x, y in test_loader:
                x = x.to(device)
                y = y.to(device)
                preds = model(x)
                predictions = preds.max(1).indices
                num_correct += (predictions.eq(y)).sum().item()
                num_samples += predictions.size(0)
            acc = num_correct / num_samples

        model.train()
        print('Epoch:{}\t Acc:{:.4f}'.format(epoch + 1, acc))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

在这里插入图片描述
六轮训练，预测精度Acc=0.9795。

提示：torch版本不同下面语句的写法可能不同

for x, y in test_loader:
                x = x.to(device)
                y = y.to(device)
                preds = model(x)
                predictions = preds.max(1).indices
                num_correct += (predictions.eq(y)).sum().item()
                num_samples += predictions.size(0)
            acc = num_correct / num_samples
1
2
3
4
5
6
7
8

如果上面的有问题，可以尝试下面的版本。

for x, y in test_loader:
                x = x.to(device)
                y = y.to(device)
                preds = model(x)
                predictions = preds.max(1).indices
                num_correct += (predictions==y).sum()
                num_samples += predictions.size(0)
            acc = (num_correct / num_samples).item()
1
2
3
4
5
6
7
8

学生模型训练和预测

	torch.manual_seed(0)
    device = torch.device("cuda" if torch.cuda.is_available else "cpu")
    torch.backends.cudnn.benchmark = True
    X_train = torchvision.datasets.MNIST(
    root="dataset/",
    train = True,
    transform = transforms.ToTensor(),
    download = True
    )
    X_test = torchvision.datasets.MNIST(
    root="dataset/",
    train = False,
    transform = transforms.ToTensor(),
    download = True
    )
    train_loader = DataLoader(dataset=X_train, batch_size=32, shuffle=True)
    test_loader = DataLoader(dataset=X_test, batch_size=32, shuffle=False)

    model = StudentModel()
    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

    epochs = 3
    for epoch in range(epochs):
        model.train()

        for data, target in tqdm(train_loader):
            data = data.to(device)
            #         print(data.size())
            #         print(torch.sum(data))
            target = target.to(device)
            #         print(target)
            preds = model(data)
            #         print(preds)
            loss = criterion(preds, target)
            #         print(loss)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        model.eval()
        num_correct = 0
        num_samples = 0

        with torch.no_grad():
            for x, y in test_loader:
                x = x.to(device)
                y = y.to(device)
                # print(y)
                preds = model(x)
                #             print(preds)
                predictions = preds.max(1).indices
                # print(predictions)
                num_correct += (predictions.eq(y)).sum().item()
                num_samples += predictions.size(0)
            acc = num_correct / num_samples

        model.train()
        print('Epoch:{}\t Acc:{:.4f}'.format(epoch + 1, acc))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

在这里插入图片描述
经过三轮训练，学生模型精度0.8314远低于教师模型

知识蒸馏训练学生模型

训练20轮，acc=0.9007。没有调整参数，不确定是否还能够达到更高的精度，但是这个结果已经比从头训练学生模型有着明显的提升。
在这里插入图片描述

完整代码

import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from tqdm import tqdm
import torchvision
from torchvision import transforms

class TeacherModel(nn.Module):
    def __init__(self, in_channels=1, num_classes=10):
        super(TeacherModel, self).__init__()
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(784, 1200)
        self.fc2 = nn.Linear(1200, 1200)
        self.fc3 = nn.Linear(1200, num_classes)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        x = x.view(-1, 784)
        x = self.fc1(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc2(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc3(x)

        return x

class StudentModel(nn.Module):
    def __init__(self, in_channels=1, num_classes=10):
        super(StudentModel, self).__init__()
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(784, 20)
        self.fc2 = nn.Linear(20, 20)
        self.fc3 = nn.Linear(20, num_classes)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        x = x.view(-1, 784)
        x = self.fc1(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc2(x)
        x = self.dropout(x)
        x = self.relu(x)

        x = self.fc3(x)

        return x

def teacher(device, train_loader, test_loader):
    print('--------------teachermodel start--------------')
    model = TeacherModel()
    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

    epochs = 6
    for epoch in range(epochs):
        model.train()

        for data, target in tqdm(train_loader):
            data = data.to(device)
            target = target.to(device)
            preds = model(data)
            loss = criterion(preds, target)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        model.eval()
        num_correct = 0
        num_samples = 0

        with torch.no_grad():
            for x, y in test_loader:
                x = x.to(device)
                y = y.to(device)
                preds = model(x)
                predictions = preds.max(1).indices
                num_correct += (predictions.eq(y)).sum().item()
                num_samples += predictions.size(0)
            acc = num_correct / num_samples

        model.train()
        print('Epoch:{}\t Acc:{:.4f}'.format(epoch + 1, acc))
    torch.save(model, 'teacher.pkl')
    print('--------------teachermodel end--------------')

def student(device, train_loader, test_loader):
    print('--------------studentmodel start--------------')

    model = StudentModel()
    model = model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

    epochs = 3
    for epoch in range(epochs):
        model.train()

        for data, target in tqdm(train_loader):
            data = data.to(device)
            target = target.to(device)
            preds = model(data)
            loss = criterion(preds, target)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        model.eval()
        num_correct = 0
        num_samples = 0

        with torch.no_grad():
            for x, y in test_loader:
                x = x.to(device)
                y = y.to(device)
                # print(y)
                preds = model(x)
                #             print(preds)
                predictions = preds.max(1).indices
                # print(predictions)
                num_correct += (predictions.eq(y)).sum().item()
                num_samples += predictions.size(0)
            acc = num_correct / num_samples

        model.train()
        print('Epoch:{}\t Acc:{:.4f}'.format(epoch + 1, acc))
    print('--------------studentmodel prediction end--------------')

def kd(teachermodel, device, train_loader, test_loader):
    print('--------------kdmodel start--------------')

    teachermodel.eval()

    studentmodel = StudentModel()
    studentmodel = studentmodel.to(device)
    studentmodel.train()

    temp = 7    #蒸馏温度
    alpha = 0.3

    hard_loss = nn.CrossEntropyLoss()
    soft_loss = nn.KLDivLoss(reduction='batchmean')

    optimizer = torch.optim.Adam(studentmodel.parameters(), lr=1e-4)

    epochs = 20
    for epoch in range(epochs):
        for data, target in tqdm(train_loader):
            data = data.to(device)
            target = target.to(device)

            with torch.no_grad():
                teacher_preds = teachermodel(data)

            student_preds = studentmodel(data)
            student_loss = hard_loss(student_preds, target) #hard_loss

            distillation_loss = soft_loss(
                F.log_softmax(student_preds / temp, dim=1),
                F.softmax(teacher_preds / temp, dim=1)
            )   #soft_loss

            loss = alpha * student_loss + (1 - alpha) * distillation_loss
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        studentmodel.eval()
        num_correct = 0
        num_samples = 0

        with torch.no_grad():
            for x, y in test_loader:
                x = x.to(device)
                y = y.to(device)
                preds = studentmodel(x)
                predictions = preds.max(1).indices
                num_correct += (predictions.eq(y)).sum().item()
                num_samples += predictions.size(0)
            acc = num_correct / num_samples

        studentmodel.train()
        print('Epoch:{}\t Acc:{:.4f}'.format(epoch + 1, acc))
    print('--------------kdmodel end--------------')


if __name__ == '__main__':
    torch.manual_seed(0)

    device = torch.device("cuda" if torch.cuda.is_available else "cpu")
    torch.backends.cudnn.benchmark = True
    #加载数据集
    X_train = torchvision.datasets.MNIST(
        root="dataset/",
        train=True,
        transform=transforms.ToTensor(),
        download=True
    )

    X_test = torchvision.datasets.MNIST(
        root="dataset/",
        train=False,
        transform=transforms.ToTensor(),
        download=True
    )

    train_loader = DataLoader(dataset=X_train, batch_size=32, shuffle=True)
    test_loader = DataLoader(dataset=X_test, batch_size=32, shuffle=False)

    #从头训练教师模型，并预测
    teacher(device, train_loader, test_loader)

   #从头训练学生模型，并预测
    student(device, train_loader, test_loader)

   #知识蒸馏训练学生模型
    model = torch.load('teacher.pkl')
    kd(model, device, train_loader, test_loader)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228

总结

本文内容是我学习过@同济子豪兄（https://www.bilibili.com/video/BV1zP4y1F7g4/?spm_id_from=333.788）的知识蒸馏讲座完成的小练习，希望能帮助到更多的同学们。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/AllinToyou/article/detail/731770