当前位置:   article > 正文

机器学习:使用matlab的fminunc+正则化实现二元逻辑回归_matlab 二项逻辑回归

matlab 二项逻辑回归

前置知识

逻辑回归的原理在此
正则化的原理在此

未正则化的二元逻辑回归

在这个问题中,我们要通过两场考试的分数来判断学生能否被录取。

数据载入

% Load Data
% The first two columns contain the exam scores and the third column contains the label.
data = load('ex2data1.txt');
X = data(:, [1, 2]); 
y = data(:, 3);
  • 1
  • 2
  • 3
  • 4
  • 5

数据可视化

这里我偷了个懒,直接用了吴恩达给的数据绘制函数:

function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure 
%   PLOTDATA(x,y) plots the data points with + for the positive examples
%   and o for the negative examples. X is assumed to be a Mx2 matrix.

% Create New Figure
figure; hold on;

% ====================== YOUR CODE HERE ======================
% Instructions: Plot the positive and negative examples on a
%               2D plot, using the option 'k+' for the positive
%               examples and 'ko' for the negative examples.
%


    % Find Indices of Positive and Negative Examples
    pos = find(y==1); neg = find(y == 0);
    % Plot Examples
    plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, 'MarkerSize', 7);
    plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y','MarkerSize', 7);






% =========================================================================



hold off;

end

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

主要步骤是,通过find函数把y==0y==1的数据下标找到,代入X中就能把两类学生用不同的记号绘制出来:

% Plot the data with + indicating (y = 1) examples and o indicating (y = 0) examples.
plotData(X, y);
 
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')

% Specified in plot order
legend('Admitted', 'Not admitted')
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

在这里插入图片描述
可以发现,两部分数据大致可以用一条直线划分开来,所以我们只需要尝试拟合出一条直线就可以了,需要三个参数 θ 0 , θ 1 , θ 2 \theta_0,\theta_1,\theta_2 θ0,θ1,θ2

逻辑函数

因为是逻辑回归,所以需要用到逻辑函数,逻辑函数的形式很简单:
g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+ez1
所以编写也并不复杂,只需要注意运算符号前加点.使函数具有对向量的兼容性:

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%   g = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly 
g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).


g=1./(1.+exp(-z));


% =============================================================

end
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

代价-梯度函数

因为这次我们不手写梯度下降,而是用matlab提供的高级算法fminunc直接求解,所以我们要编写一个代价-梯度函数costFunction提供给该算法。该函数需要能返回代价函数的值以及每个参数的梯度。

代价函数定义为
J ( θ ) = − 1 m ∑ i = 1 m [ y log ⁡ ( h θ ( x ⃗ ) ) + ( 1 − y ) log ⁡ ( 1 − h θ ( x ⃗ ) ) ] J(\theta)=-\frac{1}{m}\sum_{i=1}^m\left[y\log(h_\theta(\vec{x}))+(1-y)\log(1-h_\theta(\vec{x}))\right] J(θ)=m1i=1m[ylog(hθ(x ))+(1y)log(1hθ(x ))]
而我们已有矩阵
X = [ x 0 ( 1 ) x 1 ( 1 ) ⋯ x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ ⋱ ⋮ x 0 ( m ) x 1 ( m ) ⋯ x n ( m ) ] , Θ = [ θ 0 θ 1 ⋮ θ n ] , Y = [ y 1 y 2 ⋮ y m ] X= \left[

x0(1)x1(1)xn(1)x0(2)x1(2)xn(2)x0(m)x1(m)xn(m)
\right] ,\Theta=\left[
θ0θ1θn
\right], Y=\left[
y1y2ym
\right] X=x0(1)x0(2)x0(m)x1(1)x1(2)x1(m)xn(1)xn(2)xn(m),Θ=θ0θ1θn,Y=y1y2ym
可将代价函数的计算过程向量化为
J ( θ ) = − 1 m [ Y T log ⁡ ( g ( X Θ ) ) + ( 1 − Y T ) log ⁡ ( 1 − g ( X Θ ) ) ] J(\theta)=-\frac{1}{m}[Y^T\log(g(X\Theta))+(1-Y^T)\log(1-g(X\Theta))] J(θ)=m1[YTlog(g(XΘ))+(1YT)log(1g(XΘ))]
而梯度就是 ∂ J ( θ ) ∂ θ j \frac{\partial J(\theta)}{\partial \theta_j} θjJ(θ)构成的列向量,公式为
∂ J ( θ ) ∂ θ j = 1 m ∑ i = 1 m ( h θ ( x ( i ) ⃗ ) − y ( i ) ) x j ( i ) \frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^m(h_\theta(\vec{x^{(i)}})-y^{(i)})x_j^{(i)} θjJ(θ)=m1i=1m(hθ(x(i) )y(i))xj(i)
同样可以在向量化后利用矩阵快速计算:
∂ J ( θ ) ∂ θ j = 1 m X T ( g ( X Θ ) − Y ) \frac{\partial J(\theta)}{\partial\theta_j}=\frac{1}{m}X^T(g(X\Theta)-Y) θjJ(θ)=m1XT(g(XΘ)Y)
代码如下:

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%

h=sigmoid(X*theta);
J=y'*log(h)+(1.-y')*log(1.-h);
J=-J/m;
grad=X'*(h-y);
grad=grad./m;






% =============================================================

end

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

预处理数据集并初始化参数

预处理:左边加一列1
初始化:全0

%  Setup the data matrix appropriately
[m, n] = size(X);

% Add intercept term to X
X = [ones(m, 1) X];

% Initialize the fitting parameters
initial_theta = zeros(n + 1, 1);
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

使用fminunc求解

直接用fminunc求解的好处在于,该算法不需要我们自己实现,同时它比梯度下降收敛的更快,且不需要人为设置学习速率 α \alpha α

调用fminunc的语句如下:

%  Set options for fminunc
options = optimoptions(@fminunc,'Algorithm','Quasi-Newton','GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost 
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

首先,我们设定fminunc的选项,GradObj设置为on,表示我们要给它的函数返回的是成本函数值和梯度,让它在求解最小值时使用梯度。此外,设置最大迭代次数MaxIter为400。

之后,定义一个以t为变量的函数@(t)(costFunction(t, X, y)),再传入变量初始值initial_theta和选项optionsfminunc就会帮助我们找到函数的最小值并返回此时的变量值theta与函数值costtheta里就是我们需要的参数矩阵了。

学习结果可视化

为了观察一下我们的参数效果如何,我们可以决策边界与数据集一起可视化,这里我还是借用吴恩达的代码:

function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
%   PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the 
%   positive examples and o for the negative examples. X is assumed to be 
%   a either 
%   1) Mx3 matrix, where the first column is an all-ones column for the 
%      intercept.
%   2) MxN, N>3 matrix, where the first column is all-ones

% Plot Data
plotData(X(:,2:3), y);
hold on

if size(X, 2) <= 3
    % Only need 2 points to define a line, so choose two endpoints
    plot_x = [min(X(:,2))-2,  max(X(:,2))+2];

    % Calculate the decision boundary line
    plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));

    % Plot, and adjust axes for better viewing
    plot(plot_x, plot_y)
    
    % Legend, specific for the exercise
    legend('Admitted', 'Not admitted', 'Decision Boundary')
    axis([30, 100, 30, 100])
else
    % Here is the grid range
    u = linspace(-1, 1.5, 50);
    v = linspace(-1, 1.5, 50);

    z = zeros(length(u), length(v));
    % Evaluate z = theta*x over the grid
    for i = 1:length(u)
        for j = 1:length(v)
            z(i,j) = mapFeature(u(i), v(j))*theta;
        end
    end
    z = z'; % important to transpose z before calling contour

    % Plot z = 0
    % Notice you need to specify the range [0, 0]
    contour(u, v, z, [0, 0], 'LineWidth', 2)
end
hold off

end
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
% Plot Boundary
plotDecisionBoundary(theta, X, y);
% Add some labels 
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')
% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

在这里插入图片描述

看起来还不错。

预测

有了参数,我们就可以进行预测了,将待预测数据与参数矩阵相乘,代入逻辑函数并四舍五入之后就可以得到预测结果:

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic 
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, 1); % Number of training examples

% You need to return the following variables correctly
p = zeros(m, 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters. 
%               You should set p to a vector of 0's and 1's
%


p=round(sigmoid(X*theta));




% =========================================================================


end
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

在原数据集上的准确率可用下面的代码计算:

% Compute accuracy on our training set
p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
  • 1
  • 2
  • 3

正则化的二元逻辑回归

在上面的问题中,一根直线就能满足我们的需求,所以我们只用到三个参数,不需要正则化。但在下面的问题中,可视化后数据集长成这样:

%  The first two columns contains the X values and the third column
%  contains the label (y).
data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);

plotData(X, y);
% Put some labels 
hold on;
% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
% Specified in plot order
legend('y = 1', 'y = 0')
hold off;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

在这里插入图片描述
显然,这个问题并不能用一根直线解决,我们需要考虑更高次项,但由 x 1 , x 2 x_1,x_2 x1,x2可以组成无数多高次项,我们很难人工判断该用什么高次项。

构造高次项

索性我们直接把0次到6次项全部囊括进来,使用吴恩达写的mapFeature函数,将28个项即28个特征值都扔进矩阵中。

function out = mapFeature(X1, X2)
% MAPFEATURE Feature mapping function to polynomial features
%
%   MAPFEATURE(X1, X2) maps the two input features
%   to quadratic features used in the regularization exercise.
%
%   Returns a new feature array with more features, comprising of 
%   X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
%
%   Inputs X1, X2 must be the same size
%

degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
    for j = 0:i
        out(:, end+1) = (X1.^(i-j)).*(X2.^j);
    end
end

end
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
% Add Polynomial Features
% Note that mapFeature also adds a column of ones for us, so the intercept term is handled
X = mapFeature(X(:,1), X(:,2));
  • 1
  • 2
  • 3

代价-梯度函数

正则化以后,代价函数和梯度都加上了正则项,代价函数变为
J ( θ ) = − 1 m ∑ i = 1 m [ y log ⁡ ( h θ ( x ⃗ ) ) + ( 1 − y ) log ⁡ ( 1 − h θ ( x ⃗ ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J(\theta)=-\frac{1}{m}\sum_{i=1}^m\left[y\log(h_\theta(\vec{x}))+(1-y)\log(1-h_\theta(\vec{x}))\right] +\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2 J(θ)=m1i=1m[ylog(hθ(x ))+(1y)log(1hθ(x ))]+2mλj=1nθj2
梯度变为
∂ J ( θ ) ∂ θ j = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m θ j \frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j θjJ(θ)=m1i=1m(hθ(x(i))y(i))xj(i)+mλθj
所以在未正则化的函数基础上稍微改一下就可以了,参数加上 λ \lambda λ,结果加上正则项,要注意 θ 0 \theta_0 θ0不正则化。

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta


h=sigmoid(X*theta);
J=y'*log(h)+(1.-y')*log(1.-h);
J=-J/m;
J=J+(lambda/(2*m)).*(theta'*theta-theta(1)^2);

grad=X'*(h-y);
grad=grad./m;
tmp=grad(1);
grad=grad+(lambda/m).*theta;
grad(1)=tmp;



% =============================================================

end

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

求解与可视化

求解没有太大差异,设置一下参数 λ \lambda λ就好了:

% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

lambda = 1;
% Set Options
options = optimoptions(@fminunc,'Algorithm','Quasi-Newton','GradObj', 'on', 'MaxIter', 1000);

% Optimize
[theta, J, exit_flag] = fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

效果还是很不错的:
在这里插入图片描述

正则参数的选择

正则参数是平衡对训练集的拟合程度与参数大小的关键,当该参数选取不合理的时候,同样会造成过拟合或欠拟合。

λ = 0 \lambda=0 λ=0时,出现过拟合:
在这里插入图片描述
λ = 100 \lambda=100 λ=100时,出现欠拟合:
在这里插入图片描述

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/475893
推荐阅读
相关标签
  

闽ICP备14008679号