赞
踩
在这个问题中,我们要通过两场考试的分数来判断学生能否被录取。
% Load Data
% The first two columns contain the exam scores and the third column contains the label.
data = load('ex2data1.txt');
X = data(:, [1, 2]);
y = data(:, 3);
这里我偷了个懒,直接用了吴恩达给的数据绘制函数:
function plotData(X, y) %PLOTDATA Plots the data points X and y into a new figure % PLOTDATA(x,y) plots the data points with + for the positive examples % and o for the negative examples. X is assumed to be a Mx2 matrix. % Create New Figure figure; hold on; % ====================== YOUR CODE HERE ====================== % Instructions: Plot the positive and negative examples on a % 2D plot, using the option 'k+' for the positive % examples and 'ko' for the negative examples. % % Find Indices of Positive and Negative Examples pos = find(y==1); neg = find(y == 0); % Plot Examples plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, 'MarkerSize', 7); plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y','MarkerSize', 7); % ========================================================================= hold off; end
主要步骤是,通过find
函数把y==0
和y==1
的数据下标找到,代入X
中就能把两类学生用不同的记号绘制出来:
% Plot the data with + indicating (y = 1) examples and o indicating (y = 0) examples.
plotData(X, y);
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')
% Specified in plot order
legend('Admitted', 'Not admitted')
可以发现,两部分数据大致可以用一条直线划分开来,所以我们只需要尝试拟合出一条直线就可以了,需要三个参数
θ
0
,
θ
1
,
θ
2
\theta_0,\theta_1,\theta_2
θ0,θ1,θ2。
因为是逻辑回归,所以需要用到逻辑函数,逻辑函数的形式很简单:
g
(
z
)
=
1
1
+
e
−
z
g(z)=\frac{1}{1+e^{-z}}
g(z)=1+e−z1
所以编写也并不复杂,只需要注意运算符号前加点.
使函数具有对向量的兼容性:
function g = sigmoid(z) %SIGMOID Compute sigmoid function % g = SIGMOID(z) computes the sigmoid of z. % You need to return the following variables correctly g = zeros(size(z)); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the sigmoid of each value of z (z can be a matrix, % vector or scalar). g=1./(1.+exp(-z)); % ============================================================= end
因为这次我们不手写梯度下降,而是用matlab提供的高级算法fminunc
直接求解,所以我们要编写一个代价-梯度函数costFunction
提供给该算法。该函数需要能返回代价函数的值以及每个参数的梯度。
代价函数定义为
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
log
(
h
θ
(
x
⃗
)
)
+
(
1
−
y
)
log
(
1
−
h
θ
(
x
⃗
)
)
]
J(\theta)=-\frac{1}{m}\sum_{i=1}^m\left[y\log(h_\theta(\vec{x}))+(1-y)\log(1-h_\theta(\vec{x}))\right]
J(θ)=−m1i=1∑m[ylog(hθ(x
))+(1−y)log(1−hθ(x
))]
而我们已有矩阵
X
=
[
x
0
(
1
)
x
1
(
1
)
⋯
x
n
(
1
)
x
0
(
2
)
x
1
(
2
)
⋯
x
n
(
2
)
⋮
⋮
⋱
⋮
x
0
(
m
)
x
1
(
m
)
⋯
x
n
(
m
)
]
,
Θ
=
[
θ
0
θ
1
⋮
θ
n
]
,
Y
=
[
y
1
y
2
⋮
y
m
]
X= \left[
可将代价函数的计算过程向量化为
J
(
θ
)
=
−
1
m
[
Y
T
log
(
g
(
X
Θ
)
)
+
(
1
−
Y
T
)
log
(
1
−
g
(
X
Θ
)
)
]
J(\theta)=-\frac{1}{m}[Y^T\log(g(X\Theta))+(1-Y^T)\log(1-g(X\Theta))]
J(θ)=−m1[YTlog(g(XΘ))+(1−YT)log(1−g(XΘ))]
而梯度就是
∂
J
(
θ
)
∂
θ
j
\frac{\partial J(\theta)}{\partial \theta_j}
∂θj∂J(θ)构成的列向量,公式为
∂
J
(
θ
)
∂
θ
j
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
⃗
)
−
y
(
i
)
)
x
j
(
i
)
\frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^m(h_\theta(\vec{x^{(i)}})-y^{(i)})x_j^{(i)}
∂θj∂J(θ)=m1i=1∑m(hθ(x(i)
)−y(i))xj(i)
同样可以在向量化后利用矩阵快速计算:
∂
J
(
θ
)
∂
θ
j
=
1
m
X
T
(
g
(
X
Θ
)
−
Y
)
\frac{\partial J(\theta)}{\partial\theta_j}=\frac{1}{m}X^T(g(X\Theta)-Y)
∂θj∂J(θ)=m1XT(g(XΘ)−Y)
代码如下:
function [J, grad] = costFunction(theta, X, y) %COSTFUNCTION Compute cost and gradient for logistic regression % J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the % parameter for logistic regression and the gradient of the cost % w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; grad = zeros(size(theta)); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta % % Note: grad should have the same dimensions as theta % h=sigmoid(X*theta); J=y'*log(h)+(1.-y')*log(1.-h); J=-J/m; grad=X'*(h-y); grad=grad./m; % ============================================================= end
预处理:左边加一列1
初始化:全0
% Setup the data matrix appropriately
[m, n] = size(X);
% Add intercept term to X
X = [ones(m, 1) X];
% Initialize the fitting parameters
initial_theta = zeros(n + 1, 1);
直接用fminunc
求解的好处在于,该算法不需要我们自己实现,同时它比梯度下降收敛的更快,且不需要人为设置学习速率
α
\alpha
α。
调用fminunc
的语句如下:
% Set options for fminunc
options = optimoptions(@fminunc,'Algorithm','Quasi-Newton','GradObj', 'on', 'MaxIter', 400);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
首先,我们设定fminunc
的选项,GradObj
设置为on
,表示我们要给它的函数返回的是成本函数值和梯度,让它在求解最小值时使用梯度。此外,设置最大迭代次数MaxIter
为400。
之后,定义一个以t为变量的函数@(t)(costFunction(t, X, y))
,再传入变量初始值initial_theta
和选项options
,fminunc
就会帮助我们找到函数的最小值并返回此时的变量值theta
与函数值cost
,theta
里就是我们需要的参数矩阵了。
为了观察一下我们的参数效果如何,我们可以决策边界与数据集一起可视化,这里我还是借用吴恩达的代码:
function plotDecisionBoundary(theta, X, y) %PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with %the decision boundary defined by theta % PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the % positive examples and o for the negative examples. X is assumed to be % a either % 1) Mx3 matrix, where the first column is an all-ones column for the % intercept. % 2) MxN, N>3 matrix, where the first column is all-ones % Plot Data plotData(X(:,2:3), y); hold on if size(X, 2) <= 3 % Only need 2 points to define a line, so choose two endpoints plot_x = [min(X(:,2))-2, max(X(:,2))+2]; % Calculate the decision boundary line plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1)); % Plot, and adjust axes for better viewing plot(plot_x, plot_y) % Legend, specific for the exercise legend('Admitted', 'Not admitted', 'Decision Boundary') axis([30, 100, 30, 100]) else % Here is the grid range u = linspace(-1, 1.5, 50); v = linspace(-1, 1.5, 50); z = zeros(length(u), length(v)); % Evaluate z = theta*x over the grid for i = 1:length(u) for j = 1:length(v) z(i,j) = mapFeature(u(i), v(j))*theta; end end z = z'; % important to transpose z before calling contour % Plot z = 0 % Notice you need to specify the range [0, 0] contour(u, v, z, [0, 0], 'LineWidth', 2) end hold off end
% Plot Boundary
plotDecisionBoundary(theta, X, y);
% Add some labels
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')
% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;
看起来还不错。
有了参数,我们就可以进行预测了,将待预测数据与参数矩阵相乘,代入逻辑函数并四舍五入之后就可以得到预测结果:
function p = predict(theta, X) %PREDICT Predict whether the label is 0 or 1 using learned logistic %regression parameters theta % p = PREDICT(theta, X) computes the predictions for X using a % threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1) m = size(X, 1); % Number of training examples % You need to return the following variables correctly p = zeros(m, 1); % ====================== YOUR CODE HERE ====================== % Instructions: Complete the following code to make predictions using % your learned logistic regression parameters. % You should set p to a vector of 0's and 1's % p=round(sigmoid(X*theta)); % ========================================================================= end
在原数据集上的准确率可用下面的代码计算:
% Compute accuracy on our training set
p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
在上面的问题中,一根直线就能满足我们的需求,所以我们只用到三个参数,不需要正则化。但在下面的问题中,可视化后数据集长成这样:
% The first two columns contains the X values and the third column
% contains the label (y).
data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);
plotData(X, y);
% Put some labels
hold on;
% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
% Specified in plot order
legend('y = 1', 'y = 0')
hold off;
显然,这个问题并不能用一根直线解决,我们需要考虑更高次项,但由
x
1
,
x
2
x_1,x_2
x1,x2可以组成无数多高次项,我们很难人工判断该用什么高次项。
索性我们直接把0次到6次项全部囊括进来,使用吴恩达写的mapFeature
函数,将28个项即28个特征值都扔进矩阵中。
function out = mapFeature(X1, X2) % MAPFEATURE Feature mapping function to polynomial features % % MAPFEATURE(X1, X2) maps the two input features % to quadratic features used in the regularization exercise. % % Returns a new feature array with more features, comprising of % X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc.. % % Inputs X1, X2 must be the same size % degree = 6; out = ones(size(X1(:,1))); for i = 1:degree for j = 0:i out(:, end+1) = (X1.^(i-j)).*(X2.^j); end end end
% Add Polynomial Features
% Note that mapFeature also adds a column of ones for us, so the intercept term is handled
X = mapFeature(X(:,1), X(:,2));
正则化以后,代价函数和梯度都加上了正则项,代价函数变为
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
log
(
h
θ
(
x
⃗
)
)
+
(
1
−
y
)
log
(
1
−
h
θ
(
x
⃗
)
)
]
+
λ
2
m
∑
j
=
1
n
θ
j
2
J(\theta)=-\frac{1}{m}\sum_{i=1}^m\left[y\log(h_\theta(\vec{x}))+(1-y)\log(1-h_\theta(\vec{x}))\right] +\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2
J(θ)=−m1i=1∑m[ylog(hθ(x
))+(1−y)log(1−hθ(x
))]+2mλj=1∑nθj2
梯度变为
∂
J
(
θ
)
∂
θ
j
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
+
λ
m
θ
j
\frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j
∂θj∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)+mλθj
所以在未正则化的函数基础上稍微改一下就可以了,参数加上
λ
\lambda
λ,结果加上正则项,要注意
θ
0
\theta_0
θ0不正则化。
function [J, grad] = costFunctionReg(theta, X, y, lambda) %COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization % J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using % theta as the parameter for regularized logistic regression and the % gradient of the cost w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; grad = zeros(size(theta)); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta h=sigmoid(X*theta); J=y'*log(h)+(1.-y')*log(1.-h); J=-J/m; J=J+(lambda/(2*m)).*(theta'*theta-theta(1)^2); grad=X'*(h-y); grad=grad./m; tmp=grad(1); grad=grad+(lambda/m).*theta; grad(1)=tmp; % ============================================================= end
求解没有太大差异,设置一下参数 λ \lambda λ就好了:
% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);
lambda = 1;
% Set Options
options = optimoptions(@fminunc,'Algorithm','Quasi-Newton','GradObj', 'on', 'MaxIter', 1000);
% Optimize
[theta, J, exit_flag] = fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
效果还是很不错的:
正则参数是平衡对训练集的拟合程度与参数大小的关键,当该参数选取不合理的时候,同样会造成过拟合或欠拟合。
当
λ
=
0
\lambda=0
λ=0时,出现过拟合:
当
λ
=
100
\lambda=100
λ=100时,出现欠拟合:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。