赞
踩
略…
计算分数:
s
=
f
(
x
,
W
)
=
W
x
s=f(x,W) =Wx
s=f(x,W)=Wx
计算完全损失(有正则项):
L
=
1
N
∑
i
=
1
N
∑
j
≠
y
i
m
a
x
(
0
,
f
(
x
i
;
W
)
)
j
−
f
(
x
i
;
W
)
y
i
+
1
)
+
λ
R
(
W
)
L=\frac{ 1}{N}\sum_{i=1}^{N}\sum_{j≠y_i}^{}max(0,f(x_i; W))_j-f(x_i; W)y_i + 1) + λR(W)
L=N1i=1∑Nj̸=yi∑max(0,f(xi;W))j−f(xi;W)yi+1)+λR(W)
R
(
W
)
=
∑
k
∑
l
W
k
,
l
2
R(W) = \sum_{k}\sum_{l}W_k,_l^2
R(W)=k∑l∑Wk,l2
梯度计算(数值方法):
由
L
i
=
∑
j
≠
y
i
m
a
x
(
0
,
W
j
X
i
T
−
W
y
i
X
i
T
+
1
)
,
也
即
L
i
=
∑
j
≠
y
i
m
a
x
(
0
,
S
j
−
S
y
i
+
1
)
L_i=\sum_{j≠y_i}max(0, W_jX_i^T-Wy_iX_i^T+1),也即L_i=\sum_{j≠y_i}max(0, S_j-Sy_i+1)
Li=j̸=yi∑max(0,WjXiT−WyiXiT+1),也即Li=j̸=yi∑max(0,Sj−Syi+1)知,当Li <= 0
时,梯度为0,只有当Li > 0
其梯度为:
当
j
≠
y
i
:
∂
L
i
∂
S
j
=
X
i
T
当j≠y_i : \frac{\partial L_i}{\partial S_j}=X_i^T
当j̸=yi:∂Sj∂Li=XiT
当
j
=
y
i
:
∂
L
i
∂
S
y
i
=
−
X
i
T
当j=y_i : \frac{\partial L_i}{\partial Sy_i}=-X_i^T
当j=yi:∂Syi∂Li=−XiT
最后计算平均值以及加入正则化。
所以再在下面svm_loss_naive
代码中:
if margin > 0:
dW[:,y[i]] += -X[i,:]
dW[:,j] += X[i,:]
dW /= num_train # 获取平均
dW += reg * W # 加入正则化
1.计算分数及对应的损失:
2.计算平均损失:
3.加入正则项
在这个练习中,你会:
• 为SVM实现全矢量化的损失函数
• 实现其解析梯度的全矢量表达式
• 使用数值梯度检查实现
• 使用验证集来调整学习速度和正则化强度
• 使用SGD优化损失函数
• 想象最终学习到的重量
这里的数据集用的是CIFAR-10,数据的加载与预处理
# Load the raw CIFAR-10 data. cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' # 清理变量以防止多次加载数据(这可能会导致内存问题) try: del X_train, y_train del X_test, y_test print('Clear previously loaded data.') except: pass X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) # As a sanity check, we print out the size of the training and test data. print('Training data shape: ', X_train.shape) print('Training labels shape: ', y_train.shape) print('Test data shape: ', X_test.shape) print('Test labels shape: ', y_test.shape)
通过训练数据和测试数据的大小可知:每张图片像素都是32 x 32 x 3,训练集有50000张,测试集有10000张。
输出:
Training data shape: (50000, 32, 32, 3)
Training labels shape: (50000,)
Test data shape: (10000, 32, 32, 3)
Test labels shape: (10000,)
这里展示来自每个类的一些图片例子:
# 从数据集中可视化一些示例。 # 我们展示了来自每个类的一些训练图像的例子。 classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] num_classes = len(classes) samples_per_class = 7 for y, cls in enumerate(classes): idxs = np.flatnonzero(y_train == y) idxs = np.random.choice(idxs, samples_per_class, replace=False) for i, idx in enumerate(idxs): plt_idx = i * num_classes + y + 1 plt.subplot(samples_per_class, num_classes, plt_idx) plt.imshow(X_train[idx].astype('uint8')) plt.axis('off') if i == 0: plt.title(cls) plt.show()
输出:
为了更有效的执行代码,将数据分割为训练集,验证集和测试集。(把训练集的前49000作为训练集,后1000作为验证集,选测试集中的前1000张作为测试集)。此外,我们将创建一个小的开发集作为训练数据的子集,可以将其用于开发,从而使代码运行得更快:
num_training = 49000 num_validation = 1000 num_test = 1000 num_dev = 500 # Our validation set will be num_validation points from the original # training set. # 将数据集中最后1000个数据作为验证集 mask = range(num_training, num_training + num_validation) X_val = X_train[mask] y_val = y_train[mask] # Our training set will be the first num_train points from the original # training set. # 将数据集前49000个数据作为训练集 mask = range(num_training) X_train = X_train[mask] y_train = y_train[mask] # We will also make a development set, which is a small subset of # the training set. # 在数据集中从0-49000数字中随机抽取num_dev大小的数据作为开发集,并且不能重用元素 mask = np.random.choice(num_training, num_dev, replace=False) X_dev = X_train[mask] y_dev = y_train[mask] # We use the first num_test points of the original test set as our # test set. # 选测试集中的前1000个作为测试集。 mask = range(num_test) X_test = X_test[mask] y_test = y_test[mask] print('Train data shape: ', X_train.shape) print('Train labels shape: ', y_train.shape) print('Validation data shape: ', X_val.shape) print('Validation labels shape: ', y_val.shape) print('Test data shape: ', X_test.shape) print('Test labels shape: ', y_test.shape)
输出:
训练集有49000张,验证集有1000张,测试集有1000张。
Train data shape: (49000, 32, 32, 3)
Train labels shape: (49000,)
Validation data shape: (1000, 32, 32, 3)
Validation labels shape: (1000,)
Test data shape: (1000, 32, 32, 3)
Test labels shape: (1000,)
预处理:将图片数据进行张量变形,重新塑成一行
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
# As a sanity check, print out the shapes of the data
# 作为一个完整性检查,打印出数据的形状
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)
输出:
以X_train.shape为例:第一维大小为X_train.shape[0]即变为49000 而第二维为-1表示列不知道多少,所以根据剩下纬度进行计算,即32x32x3=3027。所以最终形状为(49000,3272)。
所以,作为一个完整性检查,打印出数据的形状:
Training data shape: (49000, 3072)
Validation data shape: (1000, 3072)
Test data shape: (1000, 3072)
dev data shape: (500, 3072)
进一步预处理:减去图像的均值
首先,根据训练数据计算图像均值
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0) # 压缩行,对各列进行求均值
print(mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()
输出:
[130.64189796 135.98173469 132.47391837 130.05569388 135.34804082
131.75402041 130.96055102 136.14328571 132.47636735 131.48467347]
第二,从训练和测试数据中减去平均图像
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image
第三,增加一维偏置(即偏置技巧),使得SVM只需要优化一个权值矩阵W。
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
# 水平堆叠序列中的数组(列方向),即最最后一列后面增加一列np.ones((X_train.shape[0], 1))(作为偏置)
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)
输出:
(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)
本节的代码将全部在cs231n/classifier /linear_svm.py
中编写。
实现简单(带循环)的结构化SVM损失函数:
输入有D维即特征,C个类别,我们操作是的N个小批量列子。
输入:
返回一个元组:
from builtins import range import numpy as np from random import shuffle from past.builtins import xrange def svm_loss_naive(W, X, y, reg): """ Structured SVM loss function, naive implementation (with loops). Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """ # 初始化梯度为0。 (D, C) dW = np.zeros(W.shape) # initialize the gradient as zero # compute the loss and the gradient 计算损失和梯度 num_classes = W.shape[1] # 取对应类别 C num_train = X.shape[0] # 取对应样本数 N loss = 0.0 for i in range(num_train): # 遍历样本i-N scores = X[i].dot(W) # 分别计算分数向量(1xC),scores vecotr: s = f(xi,W) correct_class_score = scores[y[i]] # 对应次样本真正标签所对应的分数(1X1) for j in range(num_classes): # 遍历类别j-C if j == y[i]: # 如果当前类别即为本样本标签,则跳过 continue # 否则计算该类别 the SVM loss,注意 delta = 1,j≠y_i的均通过S_j - S_yi + 1 分别进行计算。 margin = scores[j] - correct_class_score + 1 # 获取对应一个实数 if margin > 0: # max函数在括号里≤0时,梯度肯定为0(初始化的值),所以直接看>0 loss += margin # 该样本的损失等于该样本所得到的间隔实数 # 计算梯度:对W求偏导 # (X_iW_j - X_iW_yi + 1)对W_yi这列,通过减去X_i # 所以从dW中取该类真正标签类别的所有特征[:,y[i]](此时全为0),使其减去该类别所有特征值 dW[:,y[i]] += -X[i,:] # (X_iW_j - X_iW_yi + 1)对W_j这列,通过加X_i # 所以从dW中分别取出非真正标签类别的所有特征[:,j](此时也全为0),使其加上该类别所有特征值。 dW[:,j] += X[i,:] # Right now the loss is a sum over all training examples, but we want it # to be an average instead so we divide by num_train. loss /= num_train # 获取该样本的平均损失 dW /= num_train # 获取平均 # Add regularization to the loss. loss += reg * np.sum(W * W) # 加入正则化,得到完整的损失函数 dW += reg * W # 加入正则化 ############################################################################# # TODO: # # Compute the gradient of the loss function and store it dW. # # Rather that first computing the loss and then computing the derivative, # # it may be simpler to compute the derivative at the same time that the # # loss is being computed. As a result you may need to modify some of the # # code above to compute the gradient. # ############################################################################# # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** return loss, dW
如您所见,我们已经预填充了compute_loss_naive函数,该函数使用for循环来评估多类SVM的损失函数。
# 评估我们为您提供的损失的简单实现
from cs231n.classifiers.linear_svm import svm_loss_naive
import time
# 生成一个随机数小的SVM权值矩阵
W = np.random.randn(3073, 10) * 0.0001
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))
输出:
loss: 9.112255
推导并实现SVM代价函数的梯度,并在函数svm_loss_naive中内联实现梯度。您会发现在现有函数中插入新代码很有帮助。
要检查是否正确地实现了梯度,可以用数值方法估计损失函数的梯度,并将数值估计与计算的梯度进行比较。我们已经为您提供了这样做的代码:
# 实现梯度之后,使用下面的代码重新计算梯度 # 用我们提供的函数来检查梯度 # 计算损失及其在W处的梯度. loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0) # 沿随机选择的几个维度数值计算梯度 # 将它们与分析计算的梯度进行比较。 # 数字应该匹配几乎沿着所有的维度。 from cs231n.gradient_check import grad_check_sparse f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0] grad_numerical = grad_check_sparse(f, W, grad) # 当正则化打开时,是否再次检查梯度 # you didn't forget the regularization gradient did you? loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1) f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0] grad_numerical = grad_check_sparse(f, W, grad)
输入,即检查numeric gradient
和analytic gradient
是否相同:
numerical: 16.558587 analytic: 16.558587, relative error: 7.044564e-12 numerical: -1.877586 analytic: -1.877586, relative error: 7.754054e-11 numerical: -17.992739 analytic: -17.992739, relative error: 9.499249e-12 numerical: 26.182227 analytic: 26.182227, relative error: 9.835010e-12 numerical: -54.546606 analytic: -54.546606, relative error: 1.438603e-11 numerical: 18.977124 analytic: 18.977124, relative error: 2.583222e-12 numerical: 20.062216 analytic: 20.062216, relative error: 1.051530e-12 numerical: 18.542379 analytic: 18.542379, relative error: 5.079020e-12 numerical: 26.232349 analytic: 26.232349, relative error: 6.182360e-12 numerical: -31.468373 analytic: -31.468373, relative error: 1.159591e-11 numerical: -48.611978 analytic: -48.617267, relative error: 5.440557e-05 numerical: 1.501071 analytic: 1.508248, relative error: 2.385005e-03 numerical: -7.296033 analytic: -7.294829, relative error: 8.252591e-05 numerical: -0.337898 analytic: -0.347751, relative error: 1.437046e-02 numerical: 24.106606 analytic: 24.120138, relative error: 2.806025e-04 numerical: 3.226319 analytic: 3.227376, relative error: 1.636281e-04 numerical: 36.533221 analytic: 36.539594, relative error: 8.722431e-05 numerical: -19.570135 analytic: -19.564052, relative error: 1.554484e-04 numerical: -40.827530 analytic: -40.827961, relative error: 5.273252e-06 numerical: -10.187518 analytic: -10.186827, relative error: 3.389178e-05
有时,gradcheck中的维度可能并不完全匹配。造成这种差异的原因是什么呢?这是担忧的原因吗?在一维中,梯度检查可能失败的简单例子是什么?如何改变这种情况发生频率的边际效应?提示:SVM的损失函数严格来说不是可微的
Your Answier: 因为SVM的损失函数严格来说不是可微的。
接下来实现svm_loss_vectorized函数;现在只计算损失;我们稍后将实现梯度。
def svm_loss_vectorized(W, X, y, reg): """ Structured SVM loss function, vectorized implementation. Inputs and outputs are the same as svm_loss_naive. """ loss = 0.0 dW = np.zeros(W.shape) # initialize the gradient as zero scores = X.dot(W) # N*C的矩阵 num_train = X.shape[0] ############################################################################# # TODO: # # Implement a vectorized version of the structured SVM loss, storing the # # result in loss. # ############################################################################# # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** #第一个参数表示取行的范围,np.arange(num_train)=500,即取所有行(总共行为500) #第二个参数表示取列。 # 所以就是取0行的多少列,1行的多少列,2行的多少列, 最终得到每张图片,正确标签对应的分数。 correct_scores = scores[np.arange(num_train),y] # 1xN correct_scores = correct_scores.reshape((num_train, -1)) # Nx1 margins = np.maximum(0,scores - correct_scores + 1) # 计算误差 NxC margins[range(num_train), y] = 0 # 将label值所在的位置误差置零 loss+=np.sum(margins) loss/=num_train # 取所有损失记录结果平均值 loss+=reg*np.sum(W*W) # 加上正则化 # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** ############################################################################# # TODO: # # Implement a vectorized version of the gradient for the structured SVM # # loss, storing the result in dW. # # # # Hint: Instead of computing the gradient from scratch, it may be easier # # to reuse some of the intermediate values that you used to compute the # # loss. # ############################################################################# # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** # 将margins>0的项(有误差的项)置为1,没误差的项为0 margins[margins > 0] = 1 # NxC # 没误差的项中有一项为标记项,计算标记项的权重分量对误差也有共享,也需要更新对应的权重分量 # margins中这个参数就是当前样本结果错误分类的数量 row_num = -np.sum(margins,1) margins[np.arange(num_train), y] = row_num # X: 200x3073 margins:200x10 -> 10x3072 dW += np.dot(X.T, margins) # 3073x10 dW /= num_train # 平均权重 dW += reg * W # 正则化 # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** return loss, dW
# 接下来实现svm_loss_vectorized函数;现在只计算损失;我们稍后将实现梯度。
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))
from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))
# The losses should match but your vectorized implementation should be much faster.
print('difference: %f' % (loss_naive - loss_vectorized))
输出:
Naive loss: 9.112255e+00 computed in 0.157111s
Vectorized loss: 9.112255e+00 computed in 0.004004s
difference: -0.000000
# 完成svm_loss_vectorized的实现,并计算梯度的损失函数,以矢量化的方式。 # 初始实现和向量化实现应该匹配, # 但是矢量化的版本应该会更快。 tic = time.time() _, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005) toc = time.time() print('Naive loss and gradient: computed in %fs' % (toc - tic)) tic = time.time() _, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005) toc = time.time() print('Vectorized loss and gradient: computed in %fs' % (toc - tic)) # 损失是一个数字,因此比较这两种实现计算的值很容易。 # 另一方面梯度是一个矩阵,所以我们用弗洛贝尼乌斯范数来比较它们。 difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro') print('difference: %f' % difference)
输入:
Naive loss and gradient: computed in 0.150105s
Vectorized loss and gradient: computed in 0.004003s
difference: 0.000000
在一个训练循环内,必要时一直重复这些步骤:
(1) 抽取训练样本x和对应目标y组成的数据批量。
(2) 在x上运行网络[这一步叫作前向传播],得到预测值y_pred。
(3) 计算网络在这批数据上的损失,用于衡量y_pred和y之间的距离。
(4) 更新网络的所有权重,使网络在这批数据上的损失略微下降。
最终得到的网络在训练数据上的损失非常小,即预测值y_pred和预期目标y之间的距离非常小。
基于当前在随机数据批量上的损失,一点一点地对参数进行调节。由于处理的是一个可微函数,你可以计算出它的梯度,从而有效地实现第四步。沿着梯度的反方向更新权重,损失每次都会变小一点。
1.抽取训练样本x和对应目标y组成的数据批量。
2.在x上运行网络,得到预测值y_pred。
3.计算网络在这批数据上的损失,用于衡量y_pred和y之间的距离。
4.计算损失相对于网络参数的梯度[一次反向传播]。
5.将参数沿着梯度的反方向移动一点,比如W -= step * gradient,从而使这批数据上的损失减小一点。
我们现在有了矢量化的有效的损失表达式,梯度和我们的梯度匹配的数值梯度。因此,我们准备做SGD以减少损失:
在文件linear_classifier
中,在函数中实现SGD:
def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100, batch_size=200, verbose=False): """ Train this linear classifier using stochastic gradient descent. 训练这个线性分类器使用随机梯度下降。 Inputs: - X: 包含训练数据的形状(N, D)的numpy数组;有N个每个维度D的训练样本。 - y: 包含训练标签的形状(N,)的numpy数组;y[i]= c表示X[i]对于c类有标签 0 <= c < C for C classes. - learning_rate: (float)用于优化的学习率。 - reg: (float)正则化强度。 - num_iters: (整数)优化时要采取的步骤数 - batch_size: (整数)在每个步骤中使用的训练示例的数量。 - verbose: (boolean)如果为真,则在优化期间打印进度。 Outputs: 包含每次训练迭代时损失函数值的列表。 """ num_train, dim = X.shape # 分别获取样本数量,以及特征数(纬度) num_classes = np.max(y) + 1 # 获取类的个数, 假设y取0…K-1,其中K是类的个数 if self.W is None: # 延迟初始化W self.W = 0.001 * np.random.randn(dim, num_classes) # 运行随机梯度下降来优化W loss_history = [] for it in range(num_iters): # 遍历,(整数)优化时要采取的步骤数 X_batch = None y_batch = None ######################################################################### # TODO: # # 从训练数据及其对应的标签中提取batch_size元素样本,用于这一轮梯度下降 # # 将数据存储在X_batch中,相应的标签存储在y_batch中; # # 采样后X_batch应该有shape (batch_size, dim), # # y_batch应该有shape (batch_size,) # # Hint: Use np.random.choice to generate indices. Sampling with # # replacement is faster than sampling without replacement. # ######################################################################### # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** i = np.random.choice(a=num_train, size=batch_size) # 取num_train中,随机选取大小为batch_size的数据 X_batch = X[i,:] # 获取所选取的i个样本,及其对应的特征 y_batch = y[i] # 获取所选取的i个样本的类标签 # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** # 评估损失和梯度 loss, grad = self.loss(X_batch, y_batch, reg) loss_history.append(loss) # 执行参数更新 ######################################################################### # TODO: # # Update the weights using the gradient and the learning rate. # # 使用梯度和学习率更新权重。 # ######################################################################### # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** # 将参数沿着梯度的反方向移动一点,从而使这批数据上的损失减小一点 # learning_rate 是步长(学习率),grad是梯度 self.W -= learning_rate*grad # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** if verbose and it % 100 == 0: print('iteration %d / %d: loss %f' % (it, num_iters, loss)) return loss_history def loss(self, X_batch, y_batch, reg): """ 计算损失函数及其导数 Compute the loss function and its derivative.. 子类将覆盖它 Subclasses will override this. Inputs: - X_batch:形状(N, D)的numpy数组,包含N个数据点;每个点都有维数D。 data points; each point has dimension D. - y_batch: 一个形状(N,)的numpy数组,其中包含用于minibatch的标签。 - reg: (float)正则化强度。 Returns: A tuple containing: - loss as a single float - 关于self.W的梯度;与W形状相同的数组 """ loss = 0.0 # 初始化为0 float dW = np.zeros(self.W.shape) # 与W形状相同的数组(初始化为0) # 计算损失: num_train = X_batch.shape[0] # 获取样本范围 scores = X_batch.dot(self.W) correct_scores = scores[np.arange(num_train),y_batch] margins = np.maximum(0, scores - correct_scores + 1) loss += np.sum(margins) # 获取所有损失记录结果 loss /=num_train # 所有记录结果平均值 loss += reg * np.sum(self.W*self.W) # 正则化 # 计算梯度: margins[margins > 0] = 1 row_num = -np.sum(margins, 1) margins[np.arange(num_train), y] = row_num dW += np.dot(X_batch.T, margins)/num_train +reg * self.W
然后使用下面的代码运行它:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,
num_iters=1500, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))
输出:
iteration 0 / 1500: loss 799.173590 iteration 100 / 1500: loss 474.736085 iteration 200 / 1500: loss 288.689859 iteration 300 / 1500: loss 175.735395 iteration 400 / 1500: loss 107.535725 iteration 500 / 1500: loss 67.656875 iteration 600 / 1500: loss 42.015562 iteration 700 / 1500: loss 28.268911 iteration 800 / 1500: loss 18.718019 iteration 900 / 1500: loss 13.551437 iteration 1000 / 1500: loss 10.574625 iteration 1100 / 1500: loss 8.425946 iteration 1200 / 1500: loss 7.439375 iteration 1300 / 1500: loss 6.384190 iteration 1400 / 1500: loss 6.081119 That took 7.483288s
一个有用的调试策略是将损失绘制为迭代数的函数:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()
写LinearSVM的预测功能。
def predict(self, X): """ Use the trained weights of this linear classifier to predict labels for data points. 利用该线性分类器的训练权值预测数据点的标签 Inputs: - X: 包含训练数据的形状(N, D)的numpy数组;每个维度D都有N个训练样本。 Returns: - y_pred: x中数据的预测标签。y_pred是一个长度为N的一维数组,每个元素都是给出预测类的整数。 """ y_pred = np.zeros(X.shape[0]) # 初始化 ########################################################################### # TODO: # # Implement this method. Store the predicted labels in y_pred. # # 将预测的标签存储在y_pred中 # ########################################################################### # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** # x(N,D) W(D,C) scores = X.dot(self.W) # (N , C) y_pred = np.argmax(scores,axis=1) # 获取得到预测的类(1,N) # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** return y_pred
写好预测功能后,评估性能训练集和验证集的性能
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))
输出:
training accuracy: 0.382551
validation accuracy: 0.386000
您应该尝试不同的学习速率和正则化强度范围;如果您小心的话,您应该能够在验证集上获得大约0.39的分类精度。
# Use the validation set to tune hyperparameters (regularization strength and # learning rate). You should experiment with different ranges for the learning # rates and regularization strengths; if you are careful you should be able to # get a classification accuracy of about 0.39 on the validation set. #Note: you may see runtime/overflow warnings during hyper-parameter search. # This may be caused by extreme values, and is not a bug. # 注意:在超参数搜索期间,您可能会看到运行时/溢出警告。这可能是由极值引起的,而不是一个bug。 learning_rates = [1e-7, 5e-5] regularization_strengths = [2.5e4, 5e4] # results is dictionary mapping tuples of the form # (learning_rate, regularization_strength) to tuples of the form # (training_accuracy, validation_accuracy). The accuracy is simply the fraction # of data points that are correctly classified. # 结果是字典将表单元组(learning_rate、regularization_strength)映射为 # 表单元组(training_accuracy、validation_accuracy)。 # 精度只是正确分类的数据点的比例。 results = {} best_val = -1 # The highest validation accuracy that we have seen so far. best_svm = None # The LinearSVM object that achieved the highest validation rate. ################################################################################ # TODO: # # Write code that chooses the best hyperparameters by tuning on the validation # # set. For each combination of hyperparameters, train a linear SVM on the # # training set, compute its accuracy on the training and validation sets, and # # store these numbers in the results dictionary. In addition, store the best # # validation accuracy in best_val and the LinearSVM object that achieves this # # accuracy in best_svm. # # 通过调整验证集来选择最佳超参数。对于每个超参数组合,在训练集上训练一个线性SVM, # 在训练集和验证集上计算其精度,并将这些数字存储在结果字典中。 # 此外,将最佳验证精度存储在best_val中,而在best_svm中存储实现此精度的线性svm对象。 # Hint: You should use a small value for num_iters as you develop your # # validation code so that the SVMs don't take much time to train; once you are # # confident that your validation code works, you should rerun the validation # # code with a larger value for num_iters. # # 在开发验证代码时,应该为num_iter使用一个小值,这样SVMs就不会花费太多时间进行培训; # 一旦您确信您的验证代码可以工作,您就应该为num_iter重新运行验证代码,并使用更大的值。 ################################################################################ # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** for learning_rate in learning_rates: for regularization_strength in regularization_strengths: svm = LinearSVM() # 对于每个超参数组合,训练一个线性SVM loss_history = svm.train(X_train, y_train, learning_rate=learning_rate, reg=regularization_strength, num_iters=1500, verbose=True) y_train_pred = svm.predict(X_train) # 这里只是对训练集进行预测而非精度 train_acc = np.mean(y_train == y_train_pred) # 在训练集上计算其精度 y_val_pred = svm.predict(X_val) # 这里只是对验证集进行预测而非精度 val_acc = np.mean(y_val == y_val_pred) # 在验证集上计算其精度 if val_acc > best_val: # 最佳验证精度存储在best_val中 best_val = val_acc best_svm = svm # 同时获取实现此精度的线性svm对象 # results是将元组(learning_rate, regularization_strength)映射为 # (training_accuracy, validation_accuracy)的字典 results[(learning_rate, regularization_strength)] = [train_acc, val_acc] # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** # Print out results. for lr, reg in sorted(results): train_accuracy, val_accuracy = results[(lr, reg)] print('lr %e reg %e train accuracy: %f val accuracy: %f' % ( lr, reg, train_accuracy, val_accuracy)) print('best validation accuracy achieved during cross-validation: %f' % best_val)
这里result字典的赋值卡壳了,参照别人的写的,写法请引起注意,找错找了半天…
输出:
...
lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.380755 val accuracy: 0.377000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.365347 val accuracy: 0.366000
lr 5.000000e-05 reg 2.500000e+04 train accuracy: 0.169265 val accuracy: 0.181000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.055898 val accuracy: 0.048000
best validation accuracy achieved during cross-validation: 0.377000
可视化交叉验证结果:
# Visualize the cross-validation results import math x_scatter = [math.log10(x[0]) for x in results] y_scatter = [math.log10(x[1]) for x in results] # plot training accuracy marker_size = 100 colors = [results[x][0] for x in results] plt.subplot(2, 1, 1) plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm) plt.colorbar() plt.xlabel('log learning rate') plt.ylabel('log regularization strength') plt.title('CIFAR-10 training accuracy') # plot validation accuracy colors = [results[x][1] for x in results] # default size of markers is 20 plt.subplot(2, 1, 2) plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm) plt.colorbar() plt.xlabel('log learning rate') plt.ylabel('log regularization strength') plt.title('CIFAR-10 validation accuracy') plt.show()
输出:
评估测试集上的最佳svm:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)
输出:
linear SVM on raw pixels final test set accuracy: 0.370000
将每节课所学的权重形象化:
# Visualize the learned weights for each class. # Depending on your choice of learning rate and regularization strength, these may # or may not be nice to look at. # 根据你对学习速度和正则化强度的选择,这些可能好看,也可能不好看。 w = best_svm.W[:-1,:] # strip out the bias w = w.reshape(32, 32, 3, 10) w_min, w_max = np.min(w), np.max(w) classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] for i in range(10): plt.subplot(2, 5, i + 1) # Rescale the weights to be between 0 and 255 wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min) plt.imshow(wimg.astype('uint8')) plt.axis('off') plt.title(classes[i])
输出:
描述您的可视化支持向量机权重是什么样子的,并提供一个简短的解释,为什么它们看起来是这样的。
Your Answier: 它们看起来像模糊信号,因为它学了数据集中的所有图片。(看别人的…)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。