当前位置:   article > 正文

优达学城计算机视觉之CNN笔记_优达学城 计算机视觉学习课程 免费

优达学城 计算机视觉学习课程 免费

Wherever you are, the best time to begin networking and finding a community within the robotics industry is now.The more ingrained you are in the robotics industry, the more likely an employer will perceive you as a roboticist.

Top 3 Platforms for In-Person Networking

There’s no secret about where to find robotics networking events - most organizations and groups use the same platforms as everyone else: Meetup, Eventbrite and Facebook.

  1. Meetup has over 2,200 robotics meetups around the world. Depending on your interests, you can go to meetups focused on NLP, machine learning, autonomous cars, and more!
  2. Eventbrite is a popular platform for small and large groups. You’ll find robotics-related events in small villages to large cities.
  3. Facebook Events are really popular with university and local groups. You’ll likely find maker-space type events and academic talks here.

Top 4 Platforms for Online Networking

  1. Udacity’s Robotics Slack Community. As a member, you can chat with a wide audience from company CEOs to hobbyists. Industry professionals all over the world are active, in discussions about new research to hosting mini robotics challenges.
  2. IEEE. By joining IEEE, you’re plugged into an international professional organization, where you are able to connect with experts and get industry news before anyone else does. If you’re a student with an .edu email address, you get a heavily discounted membership price.
  3. Silicon Valley Robotics. One of the bigger professional networks out there, membership gets you events and job boards, in addition to access to a larger community. If you’re not in Silicon Valley and there’s not a similar group in your area, start your own! Professional networks start with a few people talking about their interests and career goals in a café, and can grow to encompass thousands of people.
  4. LinkedIn Groups. Unsurprisingly, LinkedIn is the best platform to connect with other professionals with shared interests - this is a primary function of LinkedIn! In addition to joining robotics-related groups, look for large communities, such as alumni networks. Best of all, LinkedIn is free!

Learn More

Stay up to date on robotics news. When you go to these events or are talking to another roboticist, you want things to talk about!

  • Robohub has daily news, including interviews with industry professionals.
  • DataTau is focused on data science and many of the surfaced articles and blogs are related to machine learning and artificial intelligence.
  • Udacity Talks, in addition to TED Talks and similar platforms, let you see into the minds of leading roboticists. Check out Sebastian Thrun’s interview with Nest’s CTO Yoky Matsuoka and Rodney Brook’s TED Talk on why we need robots.

Intro to Neural Network

After learning this lesson ,you will have a foundation for understanding how you build powful neural network from the ground up.

CNN 结构

分类 CNN 会接受输入图像并输出类别得分分布,我们可以从中找到给定图像的概率最高类别。在学习这节课时,你可能会发现这篇博文很有用,它描述了图像分类管道和构成 CNN 的层级。

img

CNN 层级

CNN 本身由多个层级构成;从输入图像中提取特征的层级,降低输入维度的层级,以及最终生成类别得分的层级。在这节课,我们将介绍所有这些不同的层级,使你能够了解如何定义并训练完整的 CNN!

img

构成分类 CNN 的详细层级。

选修:复习并学习 PyTorch

对于这门课程,你必须知道神经网络是如何通过反向传播训练的,并知道使用哪些损失函数训练 CNN 完成分类任务。如果你想复习这方面的资料,请参阅选修部分:复习 - 训练神经网络(位于主课程页面所有课程的底部),并认真观看该部分的视频。

此复习部分介绍了:

  • 神经网络如何通过反向传播训练和更新权重
  • 如何在 PyTorch 中构建模型

img

PyTorch 图标。

为何要使用 PyTorch?

我们将在整个课程中使用 PyTorch。PyTorch 绝对是个很新的框架,但是与 Tensorflow 变量和会话相比,它速度更快,更直观。PyTorch 旨在和普通的 Python 代码看起来和操作起来更像:PyTorch 神经网络具有自己的层级和前馈行为(定义为类)。用类定义网络意味着你可以实例化多个网络,动态更改模型结构,并在训练和测试过程中调用这些类函数。

PyTorch 还适合测试不同的模型架构,在这门课程中强烈建议这么做!PyTorch 网络是模块化的,使你能够轻松地更改网络中的一个层级或修改损失函数,并看看对训练的影响如何。如果你要了解 PyTorch 与 TensorFlow 的对比效果,建议阅读这篇博文

预处理

请查看以下步骤,了解预处理在创建此数据集的过程中如何扮演着重要角色。

img

创建 FashionMNIST 数据的预处理步骤。

卷积神经网络 (CNN)

图像处理任务(例如对图像分组)中最强大的神经网络类型是卷积神经网络 (CNN)。CNN 由处理视觉信息的层级组成。CNN 首先接受输入图像,然后将其传入这些层级。层级有几种不同的类型,我们首先将学习最常用的层级:卷积层、池化层和全连接层。

首先,我们来看一个完整的 CNN 架构;下面是一个叫做 VGG-16 的网络,该网络已经过训练,能够识别各种不同的图像类别。它将图像作为输入,并输出该图像的预测类别。我们已标出各种层级,并且将在下面的几个视频中讲解此网络中每种类型的层级。

img

VGG-16 架构

卷积层

该网络中的第一个层级是一个卷积层,负责直接处理输入图像。

  • 卷积层接受一个图像作为输入。
  • 顾名思义,卷积层由一组卷积过滤器(你已经见过并编写过)组成。
  • 每个过滤器都会提取一种特定的特征,例如高通过滤器通常用来检测对象的边缘。
  • 给定卷积层的输出是一组特征图(也称为激活图),它们是原始输入图像的过滤版本。

激活函数

你可能还注意到,该图表中显示了“convolution + ReLu”,ReLu 表示修正线性单元 (ReLU) 激活函数。当输入 x <= 0 时,此激活函数的结果为 0,当 x > 0 时,结果是直线,斜率为 1。ReLu 和其他激活函数通常放在卷积层之后,以便稍微转换输出,从而更有效地进行反向传播并有效训练网络。

1552627495361

We can see a clear white line defining the right edge of the car ,this is because all of the corresponding regions in the car image closely resemble the filter where we have a vertical line of dark pixels to the left of a vertical line of lighter pixels .

This image for instance contains many regions that would be discovered or detected by one of four filters we define before.

Filters that functions as edge detectors are very important in CNNs, and we’ll revisit them later .

Now we have understanding how to convolution function on the gray image .

P:This 3D array is best conceptualized as a stack of three two-dimensional matrices. So how do we perform a convolution on a color image ?

A:Only now the filter is itself three-dimensional to have a value for each color channel . 1552628135139

you can think about each of the feature maps in a convolutional layer along the same lines as an image channel and stuck them to get a 3D array . Then we can use this 3D array as input to still another convolutional layer to discover pattern within patterns that we discovered in the first convolutional layer .

Remember that in some sense convolutional layers, aren’t too different from the dense layers that you saw in the previous section . **Dense layers(密集层) are fully connected ** ,Meaning that the nodes are connected to every node in the previous layers. **Convolutional layers are locally connected **,where their nodes are connected to only a small subset of the previously nodes . **Convolutional layers also have this adding parameter sharing **,But both Dense layers and convolutional layers , inference works the same way .

In the case of CNNs where the weights take the form of convolutional filters .(对于CNN来说,权重是卷积滤波器形式),those filters are randomly generated, and so are the patterns that they’re initially designed to detect .

As with NLP piece when we construct a CNN,we will always specify a loss function . In the case of multiclass classification this will be categorical cross entropy loss. Then as we train the model through back propagation , the filter will update at each epic to take on values that minimize the loss functions.

In other words, the CNN determines what kind of pattern it needs to detect base on the loss function .We’ll visualize these patterns later and see that for instance if our data set contains dogs, the CNN is able to on its own learn filters that look like dog.(CNN能够自己学习看起来像狗的滤波器) 。

So with CNNs to emphasize we won’t specify the values of he filters or tell the CNN what kind of patterns it need to detect . This will be learned from the data .

在PyTorch中定义层级

定义网络架构

此处介绍了构成任何神经网络的各种层级。对于卷积神经网络,我们将使用简单的层级系列:

  • 卷积层

  • 最大池化层

  • 全连接(线性)层


    要在 PyTorch 中定义神经网络,你将创建并命名一个新的神经网络类,在函数 __init__ 中定义网络层级,并定义在函数 forward 中利用这些初始化层级的网络前馈行为,forward 会接受输入图像张量 x。下面显示了叫做 Net 的此类的结构。

    注意:在训练期间,PyTorch 将通过跟踪网络的前馈行为并使用 autograd 计算网络权重更新幅度,对网络进行反向传播。

    import torch.nn as nn
    import torch.nn.functional as F
    
    class Net(nn.Module):
    
        def __init__(self, n_classes):
            super(Net, self).__init__()
    
            # 1 input image channel (grayscale), 32 output channels/feature maps
            # 5x5 square convolution kernel
            self.conv1 = nn.Conv2d(1, 32, 5)
    
            # maxpool layer
            # pool with kernel_size=2, stride=2
            self.pool = nn.MaxPool2d(2, 2)
    
            # fully-connected layer
            # 32*4 input size to account for the downsampled image size after pooling
            # num_classes outputs (for n_classes of image data)
            self.fc1 = nn.Linear(32*4, n_classes)
    
        # define the feedforward behavior
        def forward(self, x):
            # one conv/relu + pool layers
            x = self.pool(F.relu(self.conv1(x)))
    
            # prep for linear layer by flattening the feature maps into feature vectors
            x = x.view(x.size(0), -1)
            # linear layer 
            x = F.relu(self.fc1(x))
    
            # final output
            return x
    
    # instantiate and print your Net
    n_classes = 20 # example number of classes
    net = Net(n_classes)
    print(net)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38

我们详细讲解下这段代码的作用。

__init__ 中定义层级

__init__ 中定义卷积层和最大池化层:

# 1 input image channel (for grayscale images), 32 output channels/feature maps, 3x3 square convolution kernel
self.conv1 = nn.Conv2d(1, 32, 3)

# maxpool that uses a square window of kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)      
  • 1
  • 2
  • 3
  • 4
  • 5
forward 中引用层级

然后像这样在 forward 函数中引用这些层级,先向 conv1 层级应用了 ReLu 激活函数,然后再应用了最大池化函数:

x = self.pool(F.relu(self.conv1(x)))
  • 1

最佳做法是将权重在训练过程中将改变的层级放在 __init__ 中,并在 forward 函数中引用它们;行为始终不变的任何层级或函数(例如预定义的激活函数)可以出现在 __init__forward 函数中;这主要是一种格式和阅读习惯。

Notebook:可视化卷积层

卷积层


在这个notebook中,我们的任务是将卷积层的四个已过滤的输出(a.k.a.特征映射图)可视化。

导入图像

import cv2
import matplotlib.pyplot as plt
%matplotlib inline

# TODO: Feel free to try out your own images here by changing img_path
# to a file path to another image on your computer!
img_path = 'images/udacity_sdc.png'

# load color image 
bgr_img = cv2.imread(img_path)
# convert to grayscale
gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)

# normalize, rescale entries to lie in [0,1]
gray_img = gray_img.astype("float32")/255

# plot image

plt.imshow(gray_img, cmap='gray')
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

定义并可视化滤波器

import numpy as np

## TODO: Feel free to modify the numbers here, to try out another filter!
filter_vals = np.array([[-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1]])

print('Filter shape: ', filter_vals.shape)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
# Defining four different filters, 
# all of which are linear combinations of the `filter_vals` defined above

# define four filters
filter_1 = filter_vals
filter_2 = -filter_1
filter_3 = filter_1.T
filter_4 = -filter_3
filters = np.array([filter_1, filter_2, filter_3, filter_4])

# For an example, print out the values of filter 1
print('Filter 1: \n', filters[1])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
### do not modify the code below this line ###

# visualize all four filters
fig = plt.figure(figsize=(10, 5))
for i in range(4):
    ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
    ax.imshow(filters[i], cmap='gray')
    ax.set_title('Filter %s' % str(i+1))
    width, height = filters[i].shape
    for x in range(width):
        for y in range(height):
            ax.annotate(str(filters[i][x][y]), xy=(y,x),
                        horizontalalignment='center',
                        verticalalignment='center',
                        color='white' if filters[i][x][y]<0 else 'black')
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

img

定义卷积层

初始化单个卷积层,使其包含你创建的所有滤波器。 请注意,这里你并不是在训练此网络, 而是在初始化卷积层中的权重,便于将正向传递此网络后发生的变化可视化

import torch
import torch.nn as nn
import torch.nn.functional as F

    
# define a neural network with a single convolutional layer with four filters
class Net(nn.Module):
    
    def __init__(self, weight):
        super(Net, self).__init__()
        # initializes the weights of the convolutional layer to be the weights of the 4 defined filters
        k_height, k_width = weight.shape[2:]
        # assumes there are 4 grayscale filters
        self.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False)
        self.conv.weight = torch.nn.Parameter(weight)

    def forward(self, x):
        # calculates the output of a convolutional layer
        # pre- and post-activation
        conv_x = self.conv(x)
        activated_x = F.relu(conv_x)
        
        # returns both layers
        return conv_x, activated_x
    
# instantiate the model and set the weights
weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor)
model = Net(weight)

# print out the layer in the network
print(model)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

将每个滤波器的输出可视化

首先,我们将定义一个辅助函数viz_layer,该函数会输入一个特定的层和多个滤波器(可选参数),并在图像通过后显示该层的输出。

for i in range(n_filters):
    ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[])
    # grab layer outputs
    ax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')
    ax.set_title('Output %s' % str(i+1))
    
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

让我们看一下在应用ReLu激活函数之前和之后,卷积层的输出有何不同。

# plot original image
plt.imshow(gray_img, cmap='gray')

# visualize all filters
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):
    ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
    ax.imshow(filters[i], cmap='gray')
    ax.set_title('Filter %s' % str(i+1))

    
# convert the image into an input Tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)

# get the convolutional layer (pre and post activation)
conv_layer, activated_layer = model(gray_img_tensor)

# visualize the output of a conv layer
viz_layer(conv_layer)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
# visualize the output of an activated conv layer
viz_layer(activated_layer)
  • 1
  • 2

VGG-16 架构

请看看 VGG-16 架构中初始卷积层之后的层级。

img

VGG-16 架构

池化层

在几个卷积层(和 ReLu)之后,你将在 VGG-16 网络中看到一个最大池化层。

  • 池化层会接受图像(通常是过滤后的图像)并输出该图像的缩小版本
  • 池化层会缩小输入的维度
  • 最大池化层会查看输入图像的区域(例如下图中的 4x4 像素区域),并选择在新的缩小区域中保留该区域最大的像素值。
  • 最大池化是 CNN 中最常见的池化层,但是也有其他类型的池化,例如平均池化。

img

用 2x2 区域和步长 2 进行最大池化

接下来,我们详细了解下这些池化层的原理。

These so-called pooling layers often take convolutional layer as input,Recall that a convolutional layer is a stack of features-maps ,where we have one feature map for each filter . A complicated dataset with many different object categories will require a large numbers of filters ,each responsible for finding a pattern in the image . **More filters mean a bigger stack ,which means that the dimensionality of our convolutional layers can get quiet large .High dimensionality means we’ll need to use more parameters , which can lead to over-fitting **.Thus we need a method for reducing this dimensionality .This is the role of pooling layers within a convolutional neural network .

  1. The first type is a max pooling layer .

    Max pooling layers will take a stack of feature maps as input .To construct the max pooling layer, we’ll work with each feature mapped separately .

  2. Global average pooling(全局平均池化)

    For a layer of this type we specify neither window size nor stride .This type of pooling is a more extreme type of dimensionality reduction . 1552640897847

Notebook : 可视化池化层

代码在刚才上面已经是实现了

# visualize the output of the pooling layer
viz_layer(pooled_layer)
  • 1
  • 2

img


VGG-16 架构

看看这个模型快结束时的层级,即一系列卷积层和池化层之后的全连接层。请注意它们的扁平形状。

img

VGG-16 架构

全连接层

全连接层的职责是将看到的输入与输出的期望格式相连。通常,意味着将图像特征矩阵转换为大小为 1xC 的特征向量,其中 C 是类别数量。例如,假设我们将图像分成 10 个类别,可以向全连接层提供一组[池化、激活]特征图作为输入,并要求它使用这些特征的组合(相乘、相加、相结合,等等)输出包含 10 项的长特征向量。此向量会将特征图中的信息压缩成一个特征向量。

Softmax

你在此网络中看到的最后一个层级是 softmax 函数。softmax 函数可以将任何值向量作为输入,并返回一个长度相同的向量,值的范围是 (0, 1) ,并且和将为 1。分类模型中经常会用到此函数,它可以将特征向量转换为概率分布。

再看看上个示例:有个网络要将图像分成 10 个类别。全连接层可以将特征图转换为大小为 1x10 的单个特征向量。然后,softmax 函数会将该向量转换为长度为 10 的概率分布,所生成向量中的每个数字表示给定输入图像是类别 1、类别 2、类别 3、…类别 10 的概率。此输出有时候称为类别得分,你可以根据这些得分提取出给定图像概率最高的类别!

过拟合

构建完整的 CNN 只需卷积层、池化层和全连接层,但是还可以添加其他层级以防止过拟合。防止过拟合的最常见层级之一是丢弃层Dropout Layers

丢弃层本质上是根据某个概率 p 关闭层级中的特定节点。这样可以确保所有节点在训练期间有平等的机会尝试和分类不同的图像,并降低了只有少数几个权重很高的节点占主导地位的可能性。

现在你已经熟悉完整卷积神经网络的所有重要组件,给定一些 PyTorch 代码示例,你应该能够熟练地构建和训练你自己的 CNN 了!接下来,将由你来定义并训练一个服饰识别 CNN!

Notebook:可视化FashionMNIST

加载并可视化FashionMNIST

在这个notebook中,我们要加载并查看 Fashion-MNIST 数据库中的图像。

任何分类问题的第一步,都是查看你正在使用的数据集。这样你可以了解有关图像和标签格式的一些详细信息,以及对如何定义网络以识别此类图像集中的模式的一些见解。

PyTorch有一些你可以使用的内置数据集,而FashionMNIST就是其中之一,它已经下载到了这个notebook中的data/目录中,所以我们要做的就是使用FashionMNIST数据集类加载这些图像,并使用DataLoader批量加载数据。

加载数据

数据集类和张量

torch.utils.data.Dataset是一个表示数据集的抽象类,而 FashionMNIST类是这个数据集类的扩展,它可以让我们加载批量的图像/标签数据,并且统一地将变换应用于我们的数据,例如将所有图像转换为用于训练神经网络的张量。张量类似于numpy数组,但也可以在GPU上使用,用来加速计算 。

下面,让我们看一看如何构建训练数据集。

# our basic libraries
import torch
import torchvision

# data loading and transforming
from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader
from torchvision import transforms

# The output of torchvision datasets are PILImage images of range [0, 1]. 
# We transform them to Tensors for input into a CNN

## Define a transform to read the data in as a tensor
data_transform = transforms.ToTensor()

# choose the training and test datasets
train_data = FashionMNIST(root='./data', train=True,
                                   download=False, transform=data_transform)

# Print out some stats about the training data
print('Train data, number of images: ', len(train_data))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
数据迭代与批处理

接下来,我们将要使用的是torch.utils.data.DataLoader,它是一个可以批量处理数据并置乱数据的迭代器。

在下一个单元格中,我们将数据置乱,并以大小为20的批量加载图像/标签数据。

# prepare data loaders, set the batch_size
## TODO: you can try changing the batch_size to be larger or smaller
## when you get to training your network, see how batch_size affects the loss
batch_size = 20

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_loader
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

将一些训练数据可视化

这个单元格会遍历该训练数据集,并使用dataiter.next()加载一个随机批次的图像/标签数据。然后,它会在2 x batch_size/2网格中将这批图像和标签可视化。


import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()
print(images[1].shape)
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
    ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title(classes[labels[idx]])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

img

更详细地查看图像

该数据集中的每个图像都是28x28像素且已归一化的灰度图像。

关于归一化的说明

归一化可以确保在训练CNN的过程中,先后经历前馈与反向传播步骤时,每个图像特征都将落入类似的值范围内,而不是过度激活该网络中的特定层。在前馈步骤期间,该神经网络会接收输入图像并将每个输入像素乘以一些卷积滤波器权重并加上偏差,然后应用一些激活和池化函数。如果没有归一化,反向传播步骤中的计算梯度将会非常大,并且会导致我们的损失增加而不是收敛。

# select an image by index
idx = 2
img = np.squeeze(images[idx])
print(images.max())
# display the pixel values in that image
fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')
width, height = img.shape
thresh = img.max()/2.5
for x in range(width):
    for y in range(height):
        val = round(img[x][y],2) if img[x][y] !=0 else 0
        ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center',
                    color='white' if img[x][y]<thresh else 'black')
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

13.在 PyTorch 中训练

加载训练数据集后,接下来你的任务将是定义一个 CNN 并训练它分类一组图像。

损失和优化器

要训练模型,你需要通过选择损失函数和优化器,定义训练方式。这些函数会决定模型在训练时更新其参数的方式,并且能够影响模型的收敛速度。

请参阅在线问答,详细了解损失函数优化器

对于这样的分类问题,我们通常使用交叉熵损失,定义代码如下所示:criterion = nn.CrossEntropyLoss()。PyTorch 还包含一些标准随机优化器,例如随机梯度下降和 Adam。建议你尝试不同的优化器,看看模型在训练时对这些优化器的响应效果。

分类与递归

选择哪种损失函数取决于你要创建的 CNN 类型;交叉熵通常适用于分类任务,但是对于递归问题可能需要选择其他损失函数,例如尝试预测服饰中心或边缘的位置 (x,y),而不是类别得分。

训练网络

通常,我们用训练数据集对网络训练一定数量的周期

以下是训练函数在遍历训练数据集时执行的步骤:

  1. 为训练准备所有输入图像和标签数据
  2. 将输入数据传入网络中(前向传递)
  3. 计算损失(预测类别与正确标签差别多大)
  4. 将梯度反向传播到网络参数中(反向传递)
  5. 更新权重(参数更新)

重复这一流程,直到平均损失足够降低。

在下个 notebook 中,你将详细了解如何训练和测试服饰分类 CNN。

此外,请参阅练习代码库,查看以下训练挑战的多个解决方案!

14. Notebook : Fashion MNIST训练练习

使用CNN 做分类

在这个notebook中,我们要对一个CNN进行定义并训练,使其学会对 Fashion-MNIST 数据库的图像进行分类。

加载数据

在这个单元格中,我们要加载FashionMNIST类中的训练与测试数据集。

# our basic libraries
import torch
import torchvision

# data loading and transforming
from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader
from torchvision import transforms

# The output of torchvision datasets are PILImage images of range [0, 1]. 
# We transform them to Tensors for input into a CNN

## Define a transform to read the data in as a tensor
data_transform = transforms.ToTensor()

# choose the training and test datasets
train_data = FashionMNIST(root='./data', train=True,
                                   download=True, transform=data_transform)

test_data = FashionMNIST(root='./data', train=False,
                                  download=True, transform=data_transform)


# Print out some stats about the training and test data
print('Train data, number of images: ', len(train_data))
print('Test data, number of images: ', len(test_data))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
Train data, number of images:  60000
Test data, number of images:  10000
  • 1
  • 2
# prepare data loaders, set the batch_size

## TODO: you can try changing the batch_size to be larger or smaller

## when you get to training your network, see how batch_size affects the loss

batch_size = 20

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)

# specify the image classes

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

将一些训练数据可视化

该单元格会遍历该训练数据集,并使用dataiter.next()加载一个随机批次的图像/标签数据。然后,它会在2 x batch_size/2网格中将这批图像和标签可视化。

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
    
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
    ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title(classes[labels[idx]])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

img

定义网络架构

这里记录了构成任何一种神经网络所需的各个层。对于卷积神经网络,我们将使用下列几个简单的层:

  • 卷积层
  • 最大池化层
  • 全连接层(线性层)

此外,我们还建议你考虑添加 dropout 层,避免过度拟合此数据。


要在PyTorch中定义一个神经网络,你可以选择在函数 __init__中定义一个模型的各个层,并定义一个网络的前馈行为,该网络会在函数forward中使用这些初始化的层,而该函数会接收输入图像张量x。此Net类的结构如下所示,并由你来填写。

注意:在训练期间,PyTorch将能够通过跟踪该网络的前馈行为并使用autograd来计算该网络中权重的更新来执行反向传播。

__init__中定义各层

提醒一下,卷积层/池化层在__init__中可以像这样定义:

# 1 input image channel (for grayscale images), 32 output channels/feature maps, 3x3 square convolution kernel
self.conv1 = nn.Conv2d(1, 32, 3)

# maxpool that uses a square window of kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)      
  • 1
  • 2
  • 3
  • 4
  • 5
引用forward 中的层

然后在这样的forward函数中引用,其中conv1层在应用最大池化层之前应用了一个ReLu激活函数:

x = self.pool(F.relu(self.conv1(x)))
  • 1

在这里,你必须要做的是,要把所有具有可训练权重的层放置在__init__函数中,例如卷积层,并在forward函数中引用它们。所有始终以相同方式运行的层或函数,例如预定义的激活函数,都可能出现在__init__forward函数中。实际上,你之后会经常看到在__init__中定义的卷积层/池化层和在forward中定义的激活层。

卷积层

你已经定义了第一个卷积层。这个卷积层在用3x3滤镜对图像进行卷积处理之后,会输入1通道的(灰度)图像并输出10个特征图。

扁平化

回想一下,要从卷积层/池化层的输出移动到线性层(即全连接层),必须先将提取的特征扁平化为矢量。如果你使用过深度学习库Keras,可能已经对Flatten()有所了解。此外,在PyTorch中,你可以使用x = x.view(x.size(0), -1),将输入 x扁平化。

TODO: 定义其余的层

下面,你可以选择在此网络中定义其他的层,具体取决于你。在这里,我们有一些建议,但你可以根据需要自行更改架构和参数。

建议与提示:

  • 至少使用两个卷积层

  • 输出必须是一个包含10个输出的线性层(对于10类服装的例子来说)

  • 使用一个dropout层,避免过度拟合

    import torch.nn as nn
    import torch.nn.functional as F
    
    class Net(nn.Module):
    
        def __init__(self):
            super(Net, self).__init__()
            
            # 1 input image channel (grayscale), 10 output channels/feature maps
            # 3x3 square convolution kernel
            ## output size = (W-F)/S +1 = (28-3)/1 +1 = 26
            # the output Tensor for one image, will have the dimensions: (10, 26, 26)
            # after one pool layer, this becomes (10, 13, 13)
            self.conv1 = nn.Conv2d(1, 10, 3)
            
            # maxpool layer
            # pool with kernel_size=2, stride=2
            self.pool = nn.MaxPool2d(2, 2)
            
            # second conv layer: 10 inputs, 20 outputs, 3x3 conv
            ## output size = (W-F)/S +1 = (13-3)/1 +1 = 11
            # the output tensor will have dimensions: (20, 11, 11)
            # after another pool layer this becomes (20, 5, 5); 5.5 is rounded down
            self.conv2 = nn.Conv2d(10, 20, 3)
            
            # 20 outputs * the 5*5 filtered/pooled map size
            # 10 output channels (for the 10 classes)
            self.fc1 = nn.Linear(20*5*5, 10)
            
    
        # define the feedforward behavior
        def forward(self, x):
            # two conv/relu + pool layers
            x = self.pool(F.relu(self.conv1(x)))
            
            x = self.pool(F.relu(self.conv2(x)))
            
            # prep for linear layer
            # flatten the inputs into a vector
            x = x.view(x.size(0), -1)
            #print("Test 1: ",x.shape())
            # one linear layer
            x = F.relu(self.fc1(x))
            #print("x1: ",x)
            # a softmax layer to convert the 10 outputs into a distribution of class scores
            x = F.log_softmax(x, dim=1)
            
            # final output
            return x
    
    # instantiate and print your Net
    net = Net()
    print(net)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    Net(
      (conv1): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (conv2): Conv2d(10, 20, kernel_size=(3, 3), stride=(1, 1))
      (fc1): Linear(in_features=500, out_features=10, bias=True)
    )
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

TODO: 定义损失函数和优化程序

请通过阅读这份在线文档,了解有关 损失函数优化程序的更多信息。

请注意,对于像这样的分类问题,通常要使用交叉熵损失,可以在以下代码中这样定义:criterion = nn.CrossEntropyLoss()。交叉熵损失结合了softmaxNLL loss ,因此,或者如本例所示,当Net的输出是class scores的分布时,你可能会看到使用了NLL Loss。

PyTorch还包括一些标准的随机优化程序,如随机梯度下降和Adam。我们建议你尝试不同的优化程序,看一看你的模型在训练时对这些不同的优化程序有怎样不同的反应。

import torch.optim as optim

## TODO: specify loss function 
# cross entropy loss combines softmax and nn.NLLLoss() in one single class.
criterion = nn.NLLLoss()

## TODO: specify optimizer 
# stochastic gradient descent with a small learning rate
optimizer = optim.SGD(net.parameters(), lr=0.001)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

关于准确度的说明

在训练之前和之后,要查看该网络的准确度。通过查看准确度,你可以真正看到该神经网络已经学到了什么技能。在下一个单元格中,让我们来看看未经训练的网络的准确度是多少。我们预计它大约为10%,这与我们猜测的所有10个类的准确度相同。

# Calculate accuracy before training
correct = 0
total = 0

# Iterate through test dataset
for images, labels in test_loader:
    #print(type(images))
    # forward pass to get outputs
    # the outputs are a series of class scores
    outputs = net(images)
    #print("outputs: ",outputs.numpy())
    # get the predicted class from the maximum value in the output-list of class scores
    _, predicted = torch.max(outputs.data, 1)
    #print(predicted.shape)
    # count up total number of correct labels
    # for which the predicted and true labels are equal
    total += labels.size(0)
    #print(labels)
    
    correct += (predicted == labels).sum()
    #print("correct: ",correct)
# calculate the accuracy
# to convert `correct` from a Tensor into a scalar, use .item()
accuracy = 100.0 * correct.item() / total

# print it out!
print('Accuracy before training: ', accuracy)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
#Accuracy before training:  9.58
  • 1

训练网络

下面,我们已经定义了一个train函数,它需要输入多个epoch才能进行训练。

  • 其中,epoch数是指神经网络在训练数据集中循环的次数。
  • 在epoch循环中,我们会分批遍历训练数据集,每1000批记录一次损失。

以下是此训练函数在训练数据集上迭代时需要执行的步骤:

  1. 为正向传递准备零点的梯度
  2. 通过网络传递输入(正向传递)
  3. 计算损失(预测类与正确标签的距离)
  4. 将梯度传播回网络参数(反向传递)
  5. 更新权重(参数更新)
  6. 输出计算出的损失
def train(n_epochs):
    
    loss_over_time = [] # to track the loss as the network trains
    
    for epoch in range(n_epochs):  # loop over the dataset multiple times
        
        running_loss = 0.0
        
        for batch_i, data in enumerate(train_loader):
            # get the input images and their corresponding labels
            inputs, labels = data

            # zero the parameter (weight) gradients
            optimizer.zero_grad()

            # forward pass to get outputs
            outputs = net(inputs)

            # calculate the loss
            loss = criterion(outputs, labels)

            # backward pass to calculate the parameter gradients
            loss.backward()

            # update the parameters
            optimizer.step()

            # print loss statistics
            # to convert loss into a scalar and add it to running_loss, we use .item()
            running_loss += loss.item()
            
            if batch_i % 1000 == 999:    # print every 1000 batches
                avg_loss = running_loss/1000
                # record and print the avg loss over the 1000 batches
                loss_over_time.append(avg_loss)
                #print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, avg_loss))
                running_loss = 0.0

    print('Finished Training')
    return loss_over_time

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
# define the number of epochs to train for
n_epochs = 1 # start small to see if your model works, initially

# call train and record the loss over time
training_loss = train(n_epochs)
  • 1
  • 2
  • 3
  • 4
  • 5

将损失可视化

要想知道随着时间的推移,你的神经网络在训练时学会了多少,一个很好的指标是查看这期间损失的多少。在这个例子中,我们输出并记录了每1000个批次和每个epoch的平均损失。让我们将其可视化,看一看损失随着时间的推移有所减少,或者没有减少。

在这个示例中,你会看到,最开始很快就会出现大量的损失,但是随着时间的推移,损失会逐渐减少。

# visualize the loss as the network trained
plt.plot(training_loss)
plt.xlabel('1000\'s of batches')
plt.ylabel('loss')
plt.ylim(0, 2.5) # consistent scale
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

对已训练的网络进行测试

只要你对模型的损失减少感到满意,就可以进行最后一步了,那就是测试!

现在,你必须要使用之前从未见过的数据集测试这个已训练的模型,从而查看它是否能够很好地概括并准确地对这个新数据集进行分类。对于包含许多预处理训练图像的FashionMNIST,一个好的模型在该测试数据集上的准确度应该达到85%以上。如果未达到此值,请尝试使用更多epoch进行训练,调整超参数,或添加/减少CNN中的层。

# initialize tensor and lists to monitor test loss and accuracy
test_loss = torch.zeros(1)
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

# set the module to evaluation mode
net.eval()

for batch_i, data in enumerate(test_loader):
    
    # get the input images and their corresponding labels
    inputs, labels = data
    
    # forward pass to get outputs
    outputs = net(inputs)

    # calculate the loss
    loss = criterion(outputs, labels)
        
    # update average test loss 
    test_loss = test_loss + ((torch.ones(1) / (batch_i + 1)) * (loss.data - test_loss))
    #print("test_loss",test_loss)
    # get the predicted class from the maximum value in the output-list of class scores
    _, predicted = torch.max(outputs.data, 1)
    #print("outputs.data: ",outputs.data)
   # print("max(output.data):",torch.max(outputs.data,1),"haha")
    # compare predictions to true label
    # this creates a `correct` Tensor that holds the number of correctly classified images in a batch
    correct = np.squeeze(predicted.eq(labels.data.view_as(predicted)))
    #print("correct: ",correct.data[2])
    #print("correct: ",correct[2].item())
    # calculate test accuracy for *each* object class
    # we get the scalar value of correct items for a class, by calling `correct[i].item()`
    for i in range(batch_size):
        label = labels.data[i]
       # print("label",label)
        #print("labels",labels)
        class_correct[label] += correct[i].item()
        class_total[label] += 1
        #print(type(class_correct))
        #print("class_correct:",class_correct)
print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0]))
print(class_correct)
for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            classes[i], 100 * class_correct[i] / class_total[i],
            class_correct[i], class_total[i]))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
Test Loss: 0.784023

Test Accuracy of T-shirt/top: 92% (925/1000)
Test Accuracy of Trouser: 96% (967/1000)
Test Accuracy of Pullover:  0% ( 0/1000)
Test Accuracy of Dress: 87% (873/1000)
Test Accuracy of  Coat: 91% (911/1000)
Test Accuracy of Sandal: 94% (945/1000)
Test Accuracy of Shirt:  0% ( 0/1000)
Test Accuracy of Sneaker: 93% (935/1000)
Test Accuracy of   Bag: 96% (967/1000)
Test Accuracy of Ankle boot: 93% (938/1000)

Test Accuracy (Overall): 74% (7461/10000)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

将样本测试结果可视化

格式:预测的类(真实的类)

# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
# get predictions
preds = np.squeeze(net(images).data.max(1, keepdim=True)[1].numpy())
images = images.numpy()

# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
    ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
                 color=("green" if preds[idx]==labels[idx] else "red"))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

问题:你的模型存在哪些缺点? 在未来的迭代中,你会如何改进它们?

答案: 除了衬衫和套头衫(准确率为0%)之外,这个模型的表现都很好。看起来它像是错误地将大多数具有相似整体形状的衣服都归类为外套。但是,由于它在除了这两个类之外的所有类上都表现良好,我猜想这个模型的泛化能力不好,而且过度拟合了某些类。我认为,通过添加一些dropout层可以提高它的准确度,并避免(译者注:原文为aoid,单词拼写错误)过度拟合。

# Saving the model
model_dir = 'saved_models/'
model_name = 'fashion_net_simple.pt'

# after training, save your model parameters in the dir 'saved_models'
# when you're ready, un-comment the line below
torch.save(net.state_dict(), model_dir+model_name)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

16 丢弃

下个解决方案将显示一个不同的(改进版)服饰分类模型。与第一个解决方案相比,有两大不同:

  1. 它增加了丢弃层
  2. 它在优化器中包含动量项:随机梯度下降

为何要进行这些改进?

丢弃

丢弃会根据某个指定的概率随机地关闭构成网络层级的(感知器)节点。放弃网络中的连接似乎有违常理,但是当网络接受训练时,某些节点比其他节点影响更大,或者会导致更大的错误,丢弃使我们能够平衡网络,使每个节点都能发挥对等的作用并达成相同的目标,如果某个节点犯错了,不会决定网络的行为。可以将丢弃看做使网络适应性更强的技巧;它使所有节点像团队一样发挥均等的作用,确保没有任何节点过弱或过强。实际上,它使我想到了用于测试系统/网站故障的 Chaos Monkey 工具。

建议参阅此处的 PyTorch 丢弃文档,了解如何向网络中添加这些层级。

要回顾丢弃的作用,请观看下方 Luis 的视频。

Here’s another way to prevent over fitting

1552653406395

This is something that happens a lot when we train neural networks. Sometimes one part of the work have very large weights and it ends up dominating all the training, while another part of the network doesn’t really play much of a role so it doesn’t get trained .

So what we’ll do to solve that is sometimes during training, we’ll turn this part off and let the rest of the network trained. What we’ll do to drop the nodes is we’ll give the algorithm a parameters .This parameter is the probability that each node gets dropped at a particular epoch .

Notice that some nodes may get turned off more than others and some others may never get turned off . This is OK,because we do it over and over again ,.On average each node will get the same treatment. This method is called **Dropout **

动量

训练网络时,你会指定一个优化器,用于降低网络在训练期间出现的误差。网络误差应该通常逐渐减小,但是可能会有一些峰值。梯度下降优化是指算出误差的局部最小值,但是难以算出全局最小值,也就是误差可以达到的最低值。因此,我们可以通过添加动量项,算出局部最小值并通过局部最小值算出全局最小值!

请观看下面的视频,复习下动量的数学原理。

Here’s another way to solve a local minimum problem.

1552654343425

Now , we want to go over the hump but by now the gradient is zero or too small , so it won’t give us a good steps . What if we look at the previous ones ? What about say the average of the last few step. If we take the average , this will takes us in direction and push us bit towards the hump . Now the average seems a bit drastic since the step we made 10 step ago is much less relevant than the step we last made .

Even better ,we can weight each step so that previous step mattes a lot and the steps before that matter less and less . Here is where we introduce momentum.

1552654631133

In this way, the steps that happened a long time ago will matter less than ones that happened recently

Notebook:FashionMNIST+dropout

在这里,你必须要做的是,要把所有具有可训练权重的层放置在__init__函数中,例如卷积层,并在forward函数中引用它们。所有始终以相同方式运行的层或函数,例如预定义的激活函数,都可能出现在__init__forward函数中。实际上,你之后会经常看到在__init__中定义的卷积层/池化层和在forward中定义的激活层。

卷积层

你已经定义了第一个卷积层。这个卷积层在用3x3滤镜对图像进行卷积处理之后,会输入1通道的(灰度)图像并输出10个特征图。

扁平化

回想一下,要从卷积层/池化层的输出移动到线性层(即全连接层),必须先将提取的特征扁平化为矢量。如果你使用过深度学习库Keras,可能已经对Flatten()有所了解。此外,在PyTorch中,你可以使用x = x.view(x.size(0), -1),将输入 x扁平化。

TODO: 定义其余的层

下面,你可以选择在此网络中定义其他的层,具体取决于你。在这里,我们有一些建议,但你可以根据需要自行更改架构和参数。

建议与提示:

  • 至少使用两个卷积层
  • 输出必须是一个包含10个输出的线性层(对于10类服装的例子来说)
  • 使用一个dropout层,避免过度拟合

关于输出大小的说明

对于任何一个卷积层,输出的特征映射图将具有指定的深度(卷积层中10个filter的深度为10),并且可以将所生成的特征映射图(宽度/高度)的尺寸计算为: 输入图像的宽度/高度W,减去filter大小F,除以步幅S,它们的总和再加上 1。方程是这样的:output_dim = (W-F)/S + 1,这里,假设填充大小为0。你可以 在这里找到这个公式的推导过程。

对于大小为2且步幅为2的池化层,输出维度将减少2倍。阅读下面代码中的注释,查看每个层的输出大小。

import torch.nn as nnhttp://cs231n.github.io/convolutional-networks/#conv
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        
        # 1 input image channel (grayscale), 10 output channels/feature maps
        # 3x3 square convolution kernel
        ## output size = (W-F)/S +1 = (28-3)/1 +1 = 26
        # the output Tensor for one image, will have the dimensions: (10, 26, 26)
        # after one pool layer, this becomes (10, 13, 13)
        self.conv1 = nn.Conv2d(1, 10, 3)
        
        # maxpool layer
        # pool with kernel_size=2, stride=2
        self.pool = nn.MaxPool2d(2, 2)
        
        # second conv layer: 10 inputs, 20 outputs, 3x3 conv
        ## output size = (W-F)/S +1 = (13-3)/1 +1 = 11
        # the output tensor will have dimensions: (20, 11, 11)
        # after another pool layer this becomes (20, 5, 5); 5.5 is rounded down
        self.conv2 = nn.Conv2d(10, 20, 3)
        
        # 20 outputs * the 5*5 filtered/pooled map size
        self.fc1 = nn.Linear(20*5*5, 50)
        
        # dropout with p=0.4
        self.fc1_drop = nn.Dropout(p=0.4)
        
        # finally, create 10 output channels (for the 10 classes)
        self.fc2 = nn.Linear(50, 10)

    # define the feedforward behavior
    def forward(self, x):
        # two conv/relu + pool layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))

        # prep for linear layer
        # this line of code is the equivalent of Flatten in Keras
        x = x.view(x.size(0), -1)
        
        # two linear layers with dropout in between
        x = F.relu(self.fc1(x))
        x = self.fc1_drop(x)
        x = self.fc2(x)
        
        # final output
        return x

# instantiate and print your Net
net = Net()
print(net)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55

TODO: 定义损失函数和优化程序

请通过阅读这份在线文档,了解有关 损失函数优化程序的更多信息。

请注意,对于像这样的分类问题,通常要使用交叉熵损失,可以在以下代码中这样定义:criterion = nn.CrossEntropyLoss()。PyTorch还包括一些标准的随机优化程序,如随机梯度下降和Adam。我们建议你尝试不同的优化程序,看一看你的模型在训练时对这些不同的优化程序有怎样不同的反应

import torch.optim as optim

## TODO: specify loss function
# using cross entropy whcih combines softmax and NLL loss
criterion = nn.CrossEntropyLoss()

## TODO: specify optimizer 
# stochastic gradient descent with a small learning rate AND some momentum
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

问题:你的模型存在哪些缺点?在未来的迭代中,你会如何改进它们?

答案: 由于T恤、衬衫和外套的整体外型非常相似,我的模型很难把它们区分开来。事实上,测试中,类的准确度最低的是:Test Accuracy of Shirt,这个模型对这类衣服的准确度只有大约60%左右。

我认为,通过对这些类进行一些数据扩充,或者添加另一个卷积层来提取更高级别的特征,就可以提高准确度。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/75066
推荐阅读
相关标签
  

闽ICP备14008679号