赞
踩
Wherever you are, the best time to begin networking and finding a community within the robotics industry is now.The more ingrained you are in the robotics industry, the more likely an employer will perceive you as a roboticist.
There’s no secret about where to find robotics networking events - most organizations and groups use the same platforms as everyone else: Meetup, Eventbrite and Facebook.
Stay up to date on robotics news. When you go to these events or are talking to another roboticist, you want things to talk about!
After learning this lesson ,you will have a foundation for understanding how you build powful neural network from the ground up.
分类 CNN 会接受输入图像并输出类别得分分布,我们可以从中找到给定图像的概率最高类别。在学习这节课时,你可能会发现这篇博文很有用,它描述了图像分类管道和构成 CNN 的层级。
CNN 本身由多个层级构成;从输入图像中提取特征的层级,降低输入维度的层级,以及最终生成类别得分的层级。在这节课,我们将介绍所有这些不同的层级,使你能够了解如何定义并训练完整的 CNN!
构成分类 CNN 的详细层级。
对于这门课程,你必须知道神经网络是如何通过反向传播训练的,并知道使用哪些损失函数训练 CNN 完成分类任务。如果你想复习这方面的资料,请参阅选修部分:复习 - 训练神经网络(位于主课程页面所有课程的底部),并认真观看该部分的视频。
此复习部分介绍了:
PyTorch 图标。
我们将在整个课程中使用 PyTorch。PyTorch 绝对是个很新的框架,但是与 Tensorflow 变量和会话相比,它速度更快,更直观。PyTorch 旨在和普通的 Python 代码看起来和操作起来更像:PyTorch 神经网络具有自己的层级和前馈行为(定义为类)。用类定义网络意味着你可以实例化多个网络,动态更改模型结构,并在训练和测试过程中调用这些类函数。
PyTorch 还适合测试不同的模型架构,在这门课程中强烈建议这么做!PyTorch 网络是模块化的,使你能够轻松地更改网络中的一个层级或修改损失函数,并看看对训练的影响如何。如果你要了解 PyTorch 与 TensorFlow 的对比效果,建议阅读这篇博文。
请查看以下步骤,了解预处理在创建此数据集的过程中如何扮演着重要角色。
创建 FashionMNIST 数据的预处理步骤。
图像处理任务(例如对图像分组)中最强大的神经网络类型是卷积神经网络 (CNN)。CNN 由处理视觉信息的层级组成。CNN 首先接受输入图像,然后将其传入这些层级。层级有几种不同的类型,我们首先将学习最常用的层级:卷积层、池化层和全连接层。
首先,我们来看一个完整的 CNN 架构;下面是一个叫做 VGG-16 的网络,该网络已经过训练,能够识别各种不同的图像类别。它将图像作为输入,并输出该图像的预测类别。我们已标出各种层级,并且将在下面的几个视频中讲解此网络中每种类型的层级。
VGG-16 架构
该网络中的第一个层级是一个卷积层,负责直接处理输入图像。
你可能还注意到,该图表中显示了“convolution + ReLu”,ReLu 表示修正线性单元 (ReLU) 激活函数。当输入 x <= 0 时,此激活函数的结果为 0,当 x > 0 时,结果是直线,斜率为 1。ReLu 和其他激活函数通常放在卷积层之后,以便稍微转换输出,从而更有效地进行反向传播并有效训练网络。
We can see a clear white line defining the right edge of the car ,this is because all of the corresponding regions in the car image closely resemble the filter where we have a vertical line of dark pixels to the left of a vertical line of lighter pixels .
This image for instance contains many regions that would be discovered or detected by one of four filters we define before.
Filters that functions as edge detectors are very important in CNNs, and we’ll revisit them later .
Now we have understanding how to convolution function on the gray image .
P:This 3D array is best conceptualized as a stack of three two-dimensional matrices. So how do we perform a convolution on a color image ?
A:Only now the filter is itself three-dimensional to have a value for each color channel .
you can think about each of the feature maps in a convolutional layer along the same lines as an image channel and stuck them to get a 3D array . Then we can use this 3D array as input to still another convolutional layer to discover pattern within patterns that we discovered in the first convolutional layer .
Remember that in some sense convolutional layers, aren’t too different from the dense layers that you saw in the previous section . **Dense layers(密集层) are fully connected ** ,Meaning that the nodes are connected to every node in the previous layers. **Convolutional layers are locally connected **,where their nodes are connected to only a small subset of the previously nodes . **Convolutional layers also have this adding parameter sharing **,But both Dense layers and convolutional layers , inference works the same way .
In the case of CNNs where the weights take the form of convolutional filters .(对于CNN来说,权重是卷积滤波器形式),those filters are randomly generated, and so are the patterns that they’re initially designed to detect .
As with NLP piece when we construct a CNN,we will always specify a loss function . In the case of multiclass classification this will be categorical cross entropy loss. Then as we train the model through back propagation , the filter will update at each epic to take on values that minimize the loss functions.
In other words, the CNN determines what kind of pattern it needs to detect base on the loss function .We’ll visualize these patterns later and see that for instance if our data set contains dogs, the CNN is able to on its own learn filters that look like dog.(CNN能够自己学习看起来像狗的滤波器) 。
So with CNNs to emphasize we won’t specify the values of he filters or tell the CNN what kind of patterns it need to detect . This will be learned from the data .
此处介绍了构成任何神经网络的各种层级。对于卷积神经网络,我们将使用简单的层级系列:
卷积层
最大池化层
全连接(线性)层
要在 PyTorch 中定义神经网络,你将创建并命名一个新的神经网络类,在函数 __init__
中定义网络层级,并定义在函数 forward
中利用这些初始化层级的网络前馈行为,forward
会接受输入图像张量 x
。下面显示了叫做 Net
的此类的结构。
注意:在训练期间,PyTorch 将通过跟踪网络的前馈行为并使用 autograd 计算网络权重更新幅度,对网络进行反向传播。
import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self, n_classes): super(Net, self).__init__() # 1 input image channel (grayscale), 32 output channels/feature maps # 5x5 square convolution kernel self.conv1 = nn.Conv2d(1, 32, 5) # maxpool layer # pool with kernel_size=2, stride=2 self.pool = nn.MaxPool2d(2, 2) # fully-connected layer # 32*4 input size to account for the downsampled image size after pooling # num_classes outputs (for n_classes of image data) self.fc1 = nn.Linear(32*4, n_classes) # define the feedforward behavior def forward(self, x): # one conv/relu + pool layers x = self.pool(F.relu(self.conv1(x))) # prep for linear layer by flattening the feature maps into feature vectors x = x.view(x.size(0), -1) # linear layer x = F.relu(self.fc1(x)) # final output return x # instantiate and print your Net n_classes = 20 # example number of classes net = Net(n_classes) print(net)
我们详细讲解下这段代码的作用。
__init__
中定义层级在 __init__
中定义卷积层和最大池化层:
# 1 input image channel (for grayscale images), 32 output channels/feature maps, 3x3 square convolution kernel
self.conv1 = nn.Conv2d(1, 32, 3)
# maxpool that uses a square window of kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)
forward
中引用层级然后像这样在 forward
函数中引用这些层级,先向 conv1 层级应用了 ReLu 激活函数,然后再应用了最大池化函数:
x = self.pool(F.relu(self.conv1(x)))
最佳做法是将权重在训练过程中将改变的层级放在 __init__
中,并在 forward
函数中引用它们;行为始终不变的任何层级或函数(例如预定义的激活函数)可以出现在 __init__
或 forward
函数中;这主要是一种格式和阅读习惯。
在这个notebook中,我们的任务是将卷积层的四个已过滤的输出(a.k.a.特征映射图)可视化。
导入图像
import cv2 import matplotlib.pyplot as plt %matplotlib inline # TODO: Feel free to try out your own images here by changing img_path # to a file path to another image on your computer! img_path = 'images/udacity_sdc.png' # load color image bgr_img = cv2.imread(img_path) # convert to grayscale gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY) # normalize, rescale entries to lie in [0,1] gray_img = gray_img.astype("float32")/255 # plot image plt.imshow(gray_img, cmap='gray') plt.show()
import numpy as np
## TODO: Feel free to modify the numbers here, to try out another filter!
filter_vals = np.array([[-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1]])
print('Filter shape: ', filter_vals.shape)
# Defining four different filters,
# all of which are linear combinations of the `filter_vals` defined above
# define four filters
filter_1 = filter_vals
filter_2 = -filter_1
filter_3 = filter_1.T
filter_4 = -filter_3
filters = np.array([filter_1, filter_2, filter_3, filter_4])
# For an example, print out the values of filter 1
print('Filter 1: \n', filters[1])
### do not modify the code below this line ###
# visualize all four filters
fig = plt.figure(figsize=(10, 5))
for i in range(4):
ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
ax.imshow(filters[i], cmap='gray')
ax.set_title('Filter %s' % str(i+1))
width, height = filters[i].shape
for x in range(width):
for y in range(height):
ax.annotate(str(filters[i][x][y]), xy=(y,x),
horizontalalignment='center',
verticalalignment='center',
color='white' if filters[i][x][y]<0 else 'black')
初始化单个卷积层,使其包含你创建的所有滤波器。 请注意,这里你并不是在训练此网络, 而是在初始化卷积层中的权重,便于将正向传递此网络后发生的变化可视化
import torch import torch.nn as nn import torch.nn.functional as F # define a neural network with a single convolutional layer with four filters class Net(nn.Module): def __init__(self, weight): super(Net, self).__init__() # initializes the weights of the convolutional layer to be the weights of the 4 defined filters k_height, k_width = weight.shape[2:] # assumes there are 4 grayscale filters self.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False) self.conv.weight = torch.nn.Parameter(weight) def forward(self, x): # calculates the output of a convolutional layer # pre- and post-activation conv_x = self.conv(x) activated_x = F.relu(conv_x) # returns both layers return conv_x, activated_x # instantiate the model and set the weights weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor) model = Net(weight) # print out the layer in the network print(model)
首先,我们将定义一个辅助函数viz_layer
,该函数会输入一个特定的层和多个滤波器(可选参数),并在图像通过后显示该层的输出。
for i in range(n_filters):
ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[])
# grab layer outputs
ax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')
ax.set_title('Output %s' % str(i+1))
让我们看一下在应用ReLu激活函数之前和之后,卷积层的输出有何不同。
# plot original image plt.imshow(gray_img, cmap='gray') # visualize all filters fig = plt.figure(figsize=(12, 6)) fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05) for i in range(4): ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[]) ax.imshow(filters[i], cmap='gray') ax.set_title('Filter %s' % str(i+1)) # convert the image into an input Tensor gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1) # get the convolutional layer (pre and post activation) conv_layer, activated_layer = model(gray_img_tensor) # visualize the output of a conv layer viz_layer(conv_layer)
# visualize the output of an activated conv layer
viz_layer(activated_layer)
请看看 VGG-16 架构中初始卷积层之后的层级。
VGG-16 架构
在几个卷积层(和 ReLu)之后,你将在 VGG-16 网络中看到一个最大池化层。
用 2x2 区域和步长 2 进行最大池化
接下来,我们详细了解下这些池化层的原理。
These so-called pooling layers often take convolutional layer as input,Recall that a convolutional layer is a stack of features-maps ,where we have one feature map for each filter . A complicated dataset with many different object categories will require a large numbers of filters ,each responsible for finding a pattern in the image . **More filters mean a bigger stack ,which means that the dimensionality of our convolutional layers can get quiet large .High dimensionality means we’ll need to use more parameters , which can lead to over-fitting **.Thus we need a method for reducing this dimensionality .This is the role of pooling layers within a convolutional neural network .
The first type is a max pooling layer .
Max pooling layers will take a stack of feature maps as input .To construct the max pooling layer, we’ll work with each feature mapped separately .
Global average pooling(全局平均池化)
For a layer of this type we specify neither window size nor stride .This type of pooling is a more extreme type of dimensionality reduction .
代码在刚才上面已经是实现了
# visualize the output of the pooling layer
viz_layer(pooled_layer)
看看这个模型快结束时的层级,即一系列卷积层和池化层之后的全连接层。请注意它们的扁平形状。
VGG-16 架构
全连接层的职责是将看到的输入与输出的期望格式相连。通常,意味着将图像特征矩阵转换为大小为 1xC 的特征向量,其中 C 是类别数量。例如,假设我们将图像分成 10 个类别,可以向全连接层提供一组[池化、激活]特征图作为输入,并要求它使用这些特征的组合(相乘、相加、相结合,等等)输出包含 10 项的长特征向量。此向量会将特征图中的信息压缩成一个特征向量。
你在此网络中看到的最后一个层级是 softmax 函数。softmax 函数可以将任何值向量作为输入,并返回一个长度相同的向量,值的范围是 (0, 1) ,并且和将为 1。分类模型中经常会用到此函数,它可以将特征向量转换为概率分布。
再看看上个示例:有个网络要将图像分成 10 个类别。全连接层可以将特征图转换为大小为 1x10 的单个特征向量。然后,softmax 函数会将该向量转换为长度为 10 的概率分布,所生成向量中的每个数字表示给定输入图像是类别 1、类别 2、类别 3、…类别 10 的概率。此输出有时候称为类别得分,你可以根据这些得分提取出给定图像概率最高的类别!
构建完整的 CNN 只需卷积层、池化层和全连接层,但是还可以添加其他层级以防止过拟合。防止过拟合的最常见层级之一是丢弃层Dropout Layers。
丢弃层本质上是根据某个概率 p 关闭层级中的特定节点。这样可以确保所有节点在训练期间有平等的机会尝试和分类不同的图像,并降低了只有少数几个权重很高的节点占主导地位的可能性。
现在你已经熟悉完整卷积神经网络的所有重要组件,给定一些 PyTorch 代码示例,你应该能够熟练地构建和训练你自己的 CNN 了!接下来,将由你来定义并训练一个服饰识别 CNN!
加载并可视化FashionMNIST
在这个notebook中,我们要加载并查看 Fashion-MNIST 数据库中的图像。
任何分类问题的第一步,都是查看你正在使用的数据集。这样你可以了解有关图像和标签格式的一些详细信息,以及对如何定义网络以识别此类图像集中的模式的一些见解。
PyTorch有一些你可以使用的内置数据集,而FashionMNIST就是其中之一,它已经下载到了这个notebook中的
data/
目录中,所以我们要做的就是使用FashionMNIST数据集类加载这些图像,并使用DataLoader
批量加载数据。加载数据
数据集类和张量
torch.utils.data.Dataset
是一个表示数据集的抽象类,而 FashionMNIST类是这个数据集类的扩展,它可以让我们加载批量的图像/标签数据,并且统一地将变换应用于我们的数据,例如将所有图像转换为用于训练神经网络的张量。张量类似于numpy数组,但也可以在GPU上使用,用来加速计算 。下面,让我们看一看如何构建训练数据集。
# our basic libraries import torch import torchvision # data loading and transforming from torchvision.datasets import FashionMNIST from torch.utils.data import DataLoader from torchvision import transforms # The output of torchvision datasets are PILImage images of range [0, 1]. # We transform them to Tensors for input into a CNN ## Define a transform to read the data in as a tensor data_transform = transforms.ToTensor() # choose the training and test datasets train_data = FashionMNIST(root='./data', train=True, download=False, transform=data_transform) # Print out some stats about the training data print('Train data, number of images: ', len(train_data))
接下来,我们将要使用的是torch.utils.data.DataLoader
,它是一个可以批量处理数据并置乱数据的迭代器。
在下一个单元格中,我们将数据置乱,并以大小为20的批量加载图像/标签数据。
# prepare data loaders, set the batch_size
## TODO: you can try changing the batch_size to be larger or smaller
## when you get to training your network, see how batch_size affects the loss
batch_size = 20
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_loader
这个单元格会遍历该训练数据集,并使用dataiter.next()
加载一个随机批次的图像/标签数据。然后,它会在2 x batch_size/2
网格中将这批图像和标签可视化。
import numpy as np import matplotlib.pyplot as plt %matplotlib inline # obtain one batch of training images dataiter = iter(train_loader) images, labels = dataiter.next() images = images.numpy() print(images[1].shape) # plot the images in the batch, along with the corresponding labels fig = plt.figure(figsize=(25, 4)) for idx in np.arange(batch_size): ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[]) ax.imshow(np.squeeze(images[idx]), cmap='gray') ax.set_title(classes[labels[idx]])
该数据集中的每个图像都是28x28
像素且已归一化的灰度图像。
归一化可以确保在训练CNN的过程中,先后经历前馈与反向传播步骤时,每个图像特征都将落入类似的值范围内,而不是过度激活该网络中的特定层。在前馈步骤期间,该神经网络会接收输入图像并将每个输入像素乘以一些卷积滤波器权重并加上偏差,然后应用一些激活和池化函数。如果没有归一化,反向传播步骤中的计算梯度将会非常大,并且会导致我们的损失增加而不是收敛。
# select an image by index idx = 2 img = np.squeeze(images[idx]) print(images.max()) # display the pixel values in that image fig = plt.figure(figsize = (12,12)) ax = fig.add_subplot(111) ax.imshow(img, cmap='gray') width, height = img.shape thresh = img.max()/2.5 for x in range(width): for y in range(height): val = round(img[x][y],2) if img[x][y] !=0 else 0 ax.annotate(str(val), xy=(y,x), horizontalalignment='center', verticalalignment='center', color='white' if img[x][y]<thresh else 'black')
加载训练数据集后,接下来你的任务将是定义一个 CNN 并训练它分类一组图像。
要训练模型,你需要通过选择损失函数和优化器,定义训练方式。这些函数会决定模型在训练时更新其参数的方式,并且能够影响模型的收敛速度。
对于这样的分类问题,我们通常使用交叉熵损失,定义代码如下所示:criterion = nn.CrossEntropyLoss()
。PyTorch 还包含一些标准随机优化器,例如随机梯度下降和 Adam。建议你尝试不同的优化器,看看模型在训练时对这些优化器的响应效果。
选择哪种损失函数取决于你要创建的 CNN 类型;交叉熵通常适用于分类任务,但是对于递归问题可能需要选择其他损失函数,例如尝试预测服饰中心或边缘的位置 (x,y),而不是类别得分。
通常,我们用训练数据集对网络训练一定数量的周期
以下是训练函数在遍历训练数据集时执行的步骤:
重复这一流程,直到平均损失足够降低。
在下个 notebook 中,你将详细了解如何训练和测试服饰分类 CNN。
在这个notebook中,我们要对一个CNN进行定义并训练,使其学会对 Fashion-MNIST 数据库的图像进行分类。
在这个单元格中,我们要加载FashionMNIST类中的训练与测试数据集。
# our basic libraries import torch import torchvision # data loading and transforming from torchvision.datasets import FashionMNIST from torch.utils.data import DataLoader from torchvision import transforms # The output of torchvision datasets are PILImage images of range [0, 1]. # We transform them to Tensors for input into a CNN ## Define a transform to read the data in as a tensor data_transform = transforms.ToTensor() # choose the training and test datasets train_data = FashionMNIST(root='./data', train=True, download=True, transform=data_transform) test_data = FashionMNIST(root='./data', train=False, download=True, transform=data_transform) # Print out some stats about the training and test data print('Train data, number of images: ', len(train_data)) print('Test data, number of images: ', len(test_data))
Train data, number of images: 60000
Test data, number of images: 10000
# prepare data loaders, set the batch_size
## TODO: you can try changing the batch_size to be larger or smaller
## when you get to training your network, see how batch_size affects the loss
batch_size = 20
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
该单元格会遍历该训练数据集,并使用dataiter.next()
加载一个随机批次的图像/标签数据。然后,它会在2 x batch_size/2
网格中将这批图像和标签可视化。
import numpy as np import matplotlib.pyplot as plt %matplotlib inline # obtain one batch of training images dataiter = iter(train_loader) images, labels = dataiter.next() images = images.numpy() # plot the images in the batch, along with the corresponding labels fig = plt.figure(figsize=(25, 4)) for idx in np.arange(batch_size): ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[]) ax.imshow(np.squeeze(images[idx]), cmap='gray') ax.set_title(classes[labels[idx]])
这里记录了构成任何一种神经网络所需的各个层。对于卷积神经网络,我们将使用下列几个简单的层:
此外,我们还建议你考虑添加 dropout 层,避免过度拟合此数据。
要在PyTorch中定义一个神经网络,你可以选择在函数 __init__
中定义一个模型的各个层,并定义一个网络的前馈行为,该网络会在函数forward
中使用这些初始化的层,而该函数会接收输入图像张量x
。此Net类的结构如下所示,并由你来填写。
注意:在训练期间,PyTorch将能够通过跟踪该网络的前馈行为并使用autograd来计算该网络中权重的更新来执行反向传播。
__init__
中定义各层提醒一下,卷积层/池化层在__init__
中可以像这样定义:
# 1 input image channel (for grayscale images), 32 output channels/feature maps, 3x3 square convolution kernel
self.conv1 = nn.Conv2d(1, 32, 3)
# maxpool that uses a square window of kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)
forward
中的层然后在这样的forward
函数中引用,其中conv1层在应用最大池化层之前应用了一个ReLu激活函数:
x = self.pool(F.relu(self.conv1(x)))
在这里,你必须要做的是,要把所有具有可训练权重的层放置在__init__
函数中,例如卷积层,并在forward
函数中引用它们。所有始终以相同方式运行的层或函数,例如预定义的激活函数,都可能出现在__init__
或forward
函数中。实际上,你之后会经常看到在__init__
中定义的卷积层/池化层和在forward
中定义的激活层。
你已经定义了第一个卷积层。这个卷积层在用3x3滤镜对图像进行卷积处理之后,会输入1通道的(灰度)图像并输出10个特征图。
回想一下,要从卷积层/池化层的输出移动到线性层(即全连接层),必须先将提取的特征扁平化为矢量。如果你使用过深度学习库Keras,可能已经对Flatten()
有所了解。此外,在PyTorch中,你可以使用x = x.view(x.size(0), -1)
,将输入 x
扁平化。
下面,你可以选择在此网络中定义其他的层,具体取决于你。在这里,我们有一些建议,但你可以根据需要自行更改架构和参数。
建议与提示:
至少使用两个卷积层
输出必须是一个包含10个输出的线性层(对于10类服装的例子来说)
使用一个dropout层,避免过度拟合
import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel (grayscale), 10 output channels/feature maps # 3x3 square convolution kernel ## output size = (W-F)/S +1 = (28-3)/1 +1 = 26 # the output Tensor for one image, will have the dimensions: (10, 26, 26) # after one pool layer, this becomes (10, 13, 13) self.conv1 = nn.Conv2d(1, 10, 3) # maxpool layer # pool with kernel_size=2, stride=2 self.pool = nn.MaxPool2d(2, 2) # second conv layer: 10 inputs, 20 outputs, 3x3 conv ## output size = (W-F)/S +1 = (13-3)/1 +1 = 11 # the output tensor will have dimensions: (20, 11, 11) # after another pool layer this becomes (20, 5, 5); 5.5 is rounded down self.conv2 = nn.Conv2d(10, 20, 3) # 20 outputs * the 5*5 filtered/pooled map size # 10 output channels (for the 10 classes) self.fc1 = nn.Linear(20*5*5, 10) # define the feedforward behavior def forward(self, x): # two conv/relu + pool layers x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) # prep for linear layer # flatten the inputs into a vector x = x.view(x.size(0), -1) #print("Test 1: ",x.shape()) # one linear layer x = F.relu(self.fc1(x)) #print("x1: ",x) # a softmax layer to convert the 10 outputs into a distribution of class scores x = F.log_softmax(x, dim=1) # final output return x # instantiate and print your Net net = Net() print(net)
Net(
(conv1): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(10, 20, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=500, out_features=10, bias=True)
)
请通过阅读这份在线文档,了解有关 损失函数 和 优化程序的更多信息。
请注意,对于像这样的分类问题,通常要使用交叉熵损失,可以在以下代码中这样定义:criterion = nn.CrossEntropyLoss()
。交叉熵损失结合了softmax
与 NLL loss
,因此,或者如本例所示,当Net的输出是class scores的分布时,你可能会看到使用了NLL Loss。
PyTorch还包括一些标准的随机优化程序,如随机梯度下降和Adam。我们建议你尝试不同的优化程序,看一看你的模型在训练时对这些不同的优化程序有怎样不同的反应。
import torch.optim as optim
## TODO: specify loss function
# cross entropy loss combines softmax and nn.NLLLoss() in one single class.
criterion = nn.NLLLoss()
## TODO: specify optimizer
# stochastic gradient descent with a small learning rate
optimizer = optim.SGD(net.parameters(), lr=0.001)
在训练之前和之后,要查看该网络的准确度。通过查看准确度,你可以真正看到该神经网络已经学到了什么技能。在下一个单元格中,让我们来看看未经训练的网络的准确度是多少。我们预计它大约为10%,这与我们猜测的所有10个类的准确度相同。
# Calculate accuracy before training correct = 0 total = 0 # Iterate through test dataset for images, labels in test_loader: #print(type(images)) # forward pass to get outputs # the outputs are a series of class scores outputs = net(images) #print("outputs: ",outputs.numpy()) # get the predicted class from the maximum value in the output-list of class scores _, predicted = torch.max(outputs.data, 1) #print(predicted.shape) # count up total number of correct labels # for which the predicted and true labels are equal total += labels.size(0) #print(labels) correct += (predicted == labels).sum() #print("correct: ",correct) # calculate the accuracy # to convert `correct` from a Tensor into a scalar, use .item() accuracy = 100.0 * correct.item() / total # print it out! print('Accuracy before training: ', accuracy)
#Accuracy before training: 9.58
下面,我们已经定义了一个train
函数,它需要输入多个epoch才能进行训练。
以下是此训练函数在训练数据集上迭代时需要执行的步骤:
def train(n_epochs): loss_over_time = [] # to track the loss as the network trains for epoch in range(n_epochs): # loop over the dataset multiple times running_loss = 0.0 for batch_i, data in enumerate(train_loader): # get the input images and their corresponding labels inputs, labels = data # zero the parameter (weight) gradients optimizer.zero_grad() # forward pass to get outputs outputs = net(inputs) # calculate the loss loss = criterion(outputs, labels) # backward pass to calculate the parameter gradients loss.backward() # update the parameters optimizer.step() # print loss statistics # to convert loss into a scalar and add it to running_loss, we use .item() running_loss += loss.item() if batch_i % 1000 == 999: # print every 1000 batches avg_loss = running_loss/1000 # record and print the avg loss over the 1000 batches loss_over_time.append(avg_loss) #print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, avg_loss)) running_loss = 0.0 print('Finished Training') return loss_over_time
# define the number of epochs to train for
n_epochs = 1 # start small to see if your model works, initially
# call train and record the loss over time
training_loss = train(n_epochs)
要想知道随着时间的推移,你的神经网络在训练时学会了多少,一个很好的指标是查看这期间损失的多少。在这个例子中,我们输出并记录了每1000个批次和每个epoch的平均损失。让我们将其可视化,看一看损失随着时间的推移有所减少,或者没有减少。
在这个示例中,你会看到,最开始很快就会出现大量的损失,但是随着时间的推移,损失会逐渐减少。
# visualize the loss as the network trained
plt.plot(training_loss)
plt.xlabel('1000\'s of batches')
plt.ylabel('loss')
plt.ylim(0, 2.5) # consistent scale
plt.show()
只要你对模型的损失减少感到满意,就可以进行最后一步了,那就是测试!
现在,你必须要使用之前从未见过的数据集测试这个已训练的模型,从而查看它是否能够很好地概括并准确地对这个新数据集进行分类。对于包含许多预处理训练图像的FashionMNIST,一个好的模型在该测试数据集上的准确度应该达到85%以上。如果未达到此值,请尝试使用更多epoch进行训练,调整超参数,或添加/减少CNN中的层。
# initialize tensor and lists to monitor test loss and accuracy test_loss = torch.zeros(1) class_correct = list(0. for i in range(10)) class_total = list(0. for i in range(10)) # set the module to evaluation mode net.eval() for batch_i, data in enumerate(test_loader): # get the input images and their corresponding labels inputs, labels = data # forward pass to get outputs outputs = net(inputs) # calculate the loss loss = criterion(outputs, labels) # update average test loss test_loss = test_loss + ((torch.ones(1) / (batch_i + 1)) * (loss.data - test_loss)) #print("test_loss",test_loss) # get the predicted class from the maximum value in the output-list of class scores _, predicted = torch.max(outputs.data, 1) #print("outputs.data: ",outputs.data) # print("max(output.data):",torch.max(outputs.data,1),"haha") # compare predictions to true label # this creates a `correct` Tensor that holds the number of correctly classified images in a batch correct = np.squeeze(predicted.eq(labels.data.view_as(predicted))) #print("correct: ",correct.data[2]) #print("correct: ",correct[2].item()) # calculate test accuracy for *each* object class # we get the scalar value of correct items for a class, by calling `correct[i].item()` for i in range(batch_size): label = labels.data[i] # print("label",label) #print("labels",labels) class_correct[label] += correct[i].item() class_total[label] += 1 #print(type(class_correct)) #print("class_correct:",class_correct) print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0])) print(class_correct) for i in range(10): if class_total[i] > 0: print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % ( classes[i], 100 * class_correct[i] / class_total[i], class_correct[i], class_total[i])) else: print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
Test Loss: 0.784023
Test Accuracy of T-shirt/top: 92% (925/1000)
Test Accuracy of Trouser: 96% (967/1000)
Test Accuracy of Pullover: 0% ( 0/1000)
Test Accuracy of Dress: 87% (873/1000)
Test Accuracy of Coat: 91% (911/1000)
Test Accuracy of Sandal: 94% (945/1000)
Test Accuracy of Shirt: 0% ( 0/1000)
Test Accuracy of Sneaker: 93% (935/1000)
Test Accuracy of Bag: 96% (967/1000)
Test Accuracy of Ankle boot: 93% (938/1000)
Test Accuracy (Overall): 74% (7461/10000)
格式:预测的类(真实的类)
# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
# get predictions
preds = np.squeeze(net(images).data.max(1, keepdim=True)[1].numpy())
images = images.numpy()
# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(images[idx]), cmap='gray')
ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
color=("green" if preds[idx]==labels[idx] else "red"))
答案: 除了衬衫和套头衫(准确率为0%)之外,这个模型的表现都很好。看起来它像是错误地将大多数具有相似整体形状的衣服都归类为外套。但是,由于它在除了这两个类之外的所有类上都表现良好,我猜想这个模型的泛化能力不好,而且过度拟合了某些类。我认为,通过添加一些dropout层可以提高它的准确度,并避免(译者注:原文为aoid,单词拼写错误)过度拟合。
# Saving the model
model_dir = 'saved_models/'
model_name = 'fashion_net_simple.pt'
# after training, save your model parameters in the dir 'saved_models'
# when you're ready, un-comment the line below
torch.save(net.state_dict(), model_dir+model_name)
下个解决方案将显示一个不同的(改进版)服饰分类模型。与第一个解决方案相比,有两大不同:
为何要进行这些改进?
丢弃会根据某个指定的概率随机地关闭构成网络层级的(感知器)节点。放弃网络中的连接似乎有违常理,但是当网络接受训练时,某些节点比其他节点影响更大,或者会导致更大的错误,丢弃使我们能够平衡网络,使每个节点都能发挥对等的作用并达成相同的目标,如果某个节点犯错了,不会决定网络的行为。可以将丢弃看做使网络适应性更强的技巧;它使所有节点像团队一样发挥均等的作用,确保没有任何节点过弱或过强。实际上,它使我想到了用于测试系统/网站故障的 Chaos Monkey 工具。
建议参阅此处的 PyTorch 丢弃文档,了解如何向网络中添加这些层级。
要回顾丢弃的作用,请观看下方 Luis 的视频。
Here’s another way to prevent over fitting
This is something that happens a lot when we train neural networks. Sometimes one part of the work have very large weights and it ends up dominating all the training, while another part of the network doesn’t really play much of a role so it doesn’t get trained .
So what we’ll do to solve that is sometimes during training, we’ll turn this part off and let the rest of the network trained. What we’ll do to drop the nodes is we’ll give the algorithm a parameters .This parameter is the probability that each node gets dropped at a particular epoch .
Notice that some nodes may get turned off more than others and some others may never get turned off . This is OK,because we do it over and over again ,.On average each node will get the same treatment. This method is called **Dropout **
训练网络时,你会指定一个优化器,用于降低网络在训练期间出现的误差。网络误差应该通常逐渐减小,但是可能会有一些峰值。梯度下降优化是指算出误差的局部最小值,但是难以算出全局最小值,也就是误差可以达到的最低值。因此,我们可以通过添加动量项,算出局部最小值并通过局部最小值算出全局最小值!
请观看下面的视频,复习下动量的数学原理。
Here’s another way to solve a local minimum problem.
Now , we want to go over the hump but by now the gradient is zero or too small , so it won’t give us a good steps . What if we look at the previous ones ? What about say the average of the last few step. If we take the average , this will takes us in direction and push us bit towards the hump . Now the average seems a bit drastic since the step we made 10 step ago is much less relevant than the step we last made .
Even better ,we can weight each step so that previous step mattes a lot and the steps before that matter less and less . Here is where we introduce momentum.
In this way, the steps that happened a long time ago will matter less than ones that happened recently
在这里,你必须要做的是,要把所有具有可训练权重的层放置在__init__
函数中,例如卷积层,并在forward
函数中引用它们。所有始终以相同方式运行的层或函数,例如预定义的激活函数,都可能出现在__init__
或forward
函数中。实际上,你之后会经常看到在__init__
中定义的卷积层/池化层和在forward
中定义的激活层。
你已经定义了第一个卷积层。这个卷积层在用3x3滤镜对图像进行卷积处理之后,会输入1通道的(灰度)图像并输出10个特征图。
回想一下,要从卷积层/池化层的输出移动到线性层(即全连接层),必须先将提取的特征扁平化为矢量。如果你使用过深度学习库Keras,可能已经对Flatten()
有所了解。此外,在PyTorch中,你可以使用x = x.view(x.size(0), -1)
,将输入 x
扁平化。
下面,你可以选择在此网络中定义其他的层,具体取决于你。在这里,我们有一些建议,但你可以根据需要自行更改架构和参数。
建议与提示:
对于任何一个卷积层,输出的特征映射图将具有指定的深度(卷积层中10个filter的深度为10),并且可以将所生成的特征映射图(宽度/高度)的尺寸计算为: 输入图像的宽度/高度W,减去filter大小F,除以步幅S,它们的总和再加上 1。方程是这样的:output_dim = (W-F)/S + 1
,这里,假设填充大小为0。你可以 在这里找到这个公式的推导过程。
对于大小为2且步幅为2的池化层,输出维度将减少2倍。阅读下面代码中的注释,查看每个层的输出大小。
import torch.nn as nnhttp://cs231n.github.io/convolutional-networks/#conv import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel (grayscale), 10 output channels/feature maps # 3x3 square convolution kernel ## output size = (W-F)/S +1 = (28-3)/1 +1 = 26 # the output Tensor for one image, will have the dimensions: (10, 26, 26) # after one pool layer, this becomes (10, 13, 13) self.conv1 = nn.Conv2d(1, 10, 3) # maxpool layer # pool with kernel_size=2, stride=2 self.pool = nn.MaxPool2d(2, 2) # second conv layer: 10 inputs, 20 outputs, 3x3 conv ## output size = (W-F)/S +1 = (13-3)/1 +1 = 11 # the output tensor will have dimensions: (20, 11, 11) # after another pool layer this becomes (20, 5, 5); 5.5 is rounded down self.conv2 = nn.Conv2d(10, 20, 3) # 20 outputs * the 5*5 filtered/pooled map size self.fc1 = nn.Linear(20*5*5, 50) # dropout with p=0.4 self.fc1_drop = nn.Dropout(p=0.4) # finally, create 10 output channels (for the 10 classes) self.fc2 = nn.Linear(50, 10) # define the feedforward behavior def forward(self, x): # two conv/relu + pool layers x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) # prep for linear layer # this line of code is the equivalent of Flatten in Keras x = x.view(x.size(0), -1) # two linear layers with dropout in between x = F.relu(self.fc1(x)) x = self.fc1_drop(x) x = self.fc2(x) # final output return x # instantiate and print your Net net = Net() print(net)
请通过阅读这份在线文档,了解有关 损失函数 和 优化程序的更多信息。
请注意,对于像这样的分类问题,通常要使用交叉熵损失,可以在以下代码中这样定义:criterion = nn.CrossEntropyLoss()
。PyTorch还包括一些标准的随机优化程序,如随机梯度下降和Adam。我们建议你尝试不同的优化程序,看一看你的模型在训练时对这些不同的优化程序有怎样不同的反应
import torch.optim as optim
## TODO: specify loss function
# using cross entropy whcih combines softmax and NLL loss
criterion = nn.CrossEntropyLoss()
## TODO: specify optimizer
# stochastic gradient descent with a small learning rate AND some momentum
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
答案: 由于T恤、衬衫和外套的整体外型非常相似,我的模型很难把它们区分开来。事实上,测试中,类的准确度最低的是:Test Accuracy of Shirt
,这个模型对这类衣服的准确度只有大约60%左右。
我认为,通过对这些类进行一些数据扩充,或者添加另一个卷积层来提取更高级别的特征,就可以提高准确度。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。