赞
踩
Logistic模型为(
x
i
\mathbf{x_i}
xi是第
i
i
i个样本,为行向量,
w
\mathbf{w}
w是特征向量,为一个列向量)
KaTeX parse error: Unknown column alignment: * at position 60: … \begin{array}{*̲*lr**} p(y_{i}…
其损失函数为
l
=
L
o
s
s
(
w
)
=
−
1
m
ln
p
(
y
∣
x
,
w
)
=
−
[
y
T
ln
(
σ
)
+
(
1
−
y
)
T
ln
(
1
−
σ
)
]
.
l=Loss(\mathbf{w})=-\frac{1}{m}\ln{p(\mathbf{y}|\mathbf{x},\mathbf{w})}=-[\mathbf{y}^{T}\ln(\mathbf{\sigma})+\mathbf{(1-y)}^{T}\ln{(1-\sigma)}].
l=Loss(w)=−m1lnp(y∣x,w)=−[yTln(σ)+(1−y)Tln(1−σ)].
则损失函数的一阶导数为
∂
l
∂
w
j
=
1
m
∑
i
=
1
m
(
σ
i
−
y
i
)
x
i
j
.
\frac{\partial{l}}{\partial{w_{j}}}=\frac{1}{m} \sum_{i=1}^m{(\sigma_{i}-y_{i})x_{ij}}.
∂wj∂l=m1i=1∑m(σi−yi)xij.
即可得向量形式
∂
l
∂
w
=
−
1
m
∂
[
y
T
ln
(
s
i
g
m
o
i
d
(
X
w
)
)
+
(
1
−
y
)
T
ln
(
1
−
s
i
g
m
o
i
d
(
X
w
)
)
]
∂
w
,
\frac{\partial{l}}{\partial{\mathbf{w}}}=-\frac{1}{m}\frac{\partial{[\mathbf{y}^{T}\ln(sigmoid(X\mathbf{w}))+\mathbf{(1-y)}^{T}\ln{(1-sigmoid(X\mathbf{w}))}]}}{\partial{\mathbf{w}}},
∂w∂l=−m1∂w∂[yTln(sigmoid(Xw))+(1−y)Tln(1−sigmoid(Xw))],
d l = − 1 m t r ( ( y ⊙ σ ( 1 − σ ) σ ) T ( d X w ) − ( ( 1 − y ) ⊙ σ ( 1 − σ ) 1 − σ ) T ( d X w ) ) \mathrm{d}l=-\frac{1}{m}tr((\mathbf{y}\odot \frac{\sigma(1-\sigma)}{\sigma})^T(\mathrm{d}X\mathbf{w})-((1-\mathbf{y})\odot\frac{\sigma(1-\sigma)}{1-\sigma})^T(\mathrm{d}X\mathbf{w})) dl=−m1tr((y⊙σσ(1−σ))T(dXw)−((1−y)⊙1−σσ(1−σ))T(dXw))
= − 1 m t r ( ( y ⊙ σ ( 1 − σ ) σ ) T X ( d w ) − ( ( 1 − y ) ⊙ σ ( 1 − σ ) 1 − σ ) T X ( d w ) ) =-\frac{1}{m}tr((\mathbf{y}\odot \frac{\sigma(1-\sigma)}{\sigma})^TX(\mathrm{d}\mathbf{w})-((1-\mathbf{y})\odot\frac{\sigma(1-\sigma)}{1-\sigma})^TX(\mathrm{d}\mathbf{w})) =−m1tr((y⊙σσ(1−σ))TX(dw)−((1−y)⊙1−σσ(1−σ))TX(dw))
= − 1 m t r ( ( X T ( y ⊙ σ ( 1 − σ ) σ ) ) T ( d w ) − ( X T ( ( 1 − y ) ⊙ σ ( 1 − σ ) 1 − σ ) ) T ( d w ) ) = − 1 m t r ( ( X T ( y − σ ) ) T ( d w ) ) , =-\frac{1}{m}tr((X^T(\mathbf{y}\odot \frac{\sigma(1-\sigma)}{\sigma}))^T(\mathrm{d}\mathbf{w})-(X^T((1-\mathbf{y})\odot\frac{\sigma(1-\sigma)}{1-\sigma}))^T(\mathrm{d}\mathbf{w}))=-\frac{1}{m}tr((X^T(\mathbf{y}-\sigma))^T(\mathrm{d}\mathbf{w})), =−m1tr((XT(y⊙σσ(1−σ)))T(dw)−(XT((1−y)⊙1−σσ(1−σ)))T(dw))=−m1tr((XT(y−σ))T(dw)),
d l = t r ( ∂ l ∂ w T d w ) , \mathrm{d}l=tr(\frac{\partial l}{\partial \mathbf{w}}^T\mathrm{d}\mathbf{w}), dl=tr(∂w∂lTdw),
∂ l ∂ w = 1 m X T ( σ − y ) , \frac{\partial l}{\partial \mathbf{w}}=\frac{1}{m}X^T(\sigma-\mathbf{y}), ∂w∂l=m1XT(σ−y),
接着求解损失函数的二阶导数(Heissian矩阵)
H
i
j
=
∂
2
l
∂
w
i
∂
w
j
=
1
m
∂
∂
w
j
∑
t
=
1
m
(
y
t
−
σ
t
)
x
t
i
=
1
m
∑
t
=
1
m
σ
t
(
σ
t
−
1
)
x
t
i
x
t
j
.
H_{ij}=\frac{\partial^{2}{l}}{\partial{w_{i}}\partial{w_{j}}}=\frac{1}{m}\frac{\partial}{\partial{w_{j}}}\sum_{t=1}^{m}{(y_{t}-\sigma_{t})x_{ti}}=\frac{1}{m}\sum_{t=1}^{m}\sigma_{t}(\sigma_{t}-1)x_{ti}x_{tj}.
Hij=∂wi∂wj∂2l=m1∂wj∂t=1∑m(yt−σt)xti=m1t=1∑mσt(σt−1)xtixtj.
即可得矩阵形式
d
∇
w
l
=
1
m
X
T
(
s
i
g
m
o
i
d
′
(
X
w
)
⊗
(
d
X
w
)
)
=
1
m
X
T
(
s
i
g
m
o
i
d
′
(
X
w
)
⊗
X
(
d
w
)
)
,
\mathrm{d}\nabla_w l=\frac{1}{m}X^T(sigmoid'(Xw)\otimes(\mathrm{d}Xw))=\frac{1}{m}X^T(sigmoid'(Xw)\otimes X(\mathrm{d}w)),
d∇wl=m1XT(sigmoid′(Xw)⊗(dXw))=m1XT(sigmoid′(Xw)⊗X(dw)),
v e c ( d ∇ w l ) = 1 m X T d i a g ( σ ( 1 − σ ) ) X v e c ( d w ) , \mathrm{vec}(\mathrm{d}\nabla_w l)=\frac{1}{m}X^Tdiag(\sigma(1-\sigma))X\mathrm{vec}(\mathrm{d}w), vec(d∇wl)=m1XTdiag(σ(1−σ))Xvec(dw),
v e c ( d ∇ w l ) = ∇ w 2 l T v e c ( d w ) , \mathrm{vec}(\mathrm{d}\nabla_w l) = \nabla^2_w l^T \mathrm{vec}(\mathrm{d}w), vec(d∇wl)=∇w2lTvec(dw),
H = ∇ w 2 l = H ( w ) = 1 m X A X T , H=\nabla^2_w l=H(\mathbf{w})=\frac{1}{m}XAX^{T}, H=∇w2l=H(w)=m1XAXT,
A = d i a g ( σ i ( 1 − σ i ) ) . A=diag(\sigma_{i}(1-\sigma_{i})). A=diag(σi(1−σi)).
由于 H H H 是一个二次型矩阵,且 A A A 的各阶顺序主子式均大于0,故 H H H 为正定矩阵,因此损失函数 l l l 为凸函数,有最小值,可通过求解 ∂ l ∂ w = 0 \frac{\partial{l}}{\partial{\mathbf{w}}}=0 ∂w∂l=0得到最优解 w \mathbf{w} w ,且牛顿法求解二次收敛,故具有较好的求解效果。
根据牛顿法的求解步骤,有下列迭代过程
w
:
=
w
−
α
(
H
−
1
∂
l
∂
w
)
.
\mathbf{w}:=\mathbf{w}-\alpha(H^{-1}\frac{\partial{l}}{\partial{\mathbf{w}}}).
w:=w−α(H−1∂w∂l).
其中,方向 H − 1 ∂ l ∂ w H^{-1}\frac{\partial{l}}{\partial{\mathbf{w}}} H−1∂w∂l 被称为牛顿方向,由于牛顿法只是二阶近似,因此需要在每次的迭代方向前乘以一个学习率 α \alpha α 作为对高阶无穷小(高阶导数项)的近似,即 α ( H − 1 ∂ l ∂ w ) \alpha(H^{-1}\frac{\partial{l}}{\partial{\mathbf{w}}}) α(H−1∂w∂l) 是每次牛顿法的迭代步长。
sigmoid.m函数(用于计算sigmoid函数值)
function g = sigmoid(z)
%SIGMOID Compute sigmoid function
% g = SIGMOID(z) computes the sigmoid of z.
g=1./(1+exp(-z));
end
Loss_function.m函数(求解模型在所有样本上的误差)
function J = loss_function(w, X, y) %%LOSS_FUNCTION computes the loss of the logistic regression model % J = LOSS_FUNCTION(w, X, y) computes the loss of the logistic %regression model of all samples with weight w. m = size(X, 1); J = 0; for i = 1:m if y(i) == 1 J = J - log(sigmoid(X(i,:) * w)); else J = J - log(1 - sigmoid(X(i,:) * w)); end end J = J / m; end
Loss_d1.m函数(用于计算损失函数的一阶导数,为一个向量)
function dl = Loss_d1(w, X, y)
%LOSS_D1 Compute the derivative
% dl = LOSS_D1(w, X, y) computes the derivative of the loss function
%of the logistic regression.
m = size(y, 1);
dl = X' * (sigmoid(X * w) - y) / m;
end
Loss_d2.m函数(用于计算损失函数的二阶导数,为一个Hessian矩阵)
function H = Loss_d2(w, X, y)
%LOSS_D2 Compute the Hessian Matrix
% H = LOSS_D2(w, X, y) computes the Hessian Matrix of the loss function
%of the logistic regression.
m = size(y, 1);
n = zeros(1, m);
for i = 1:m
n(i) = sigmoid(X(i,:) * w) * (1 - sigmoid(X(i,:) * w));
end
A = diag(n);
H = X' * A * X / m;
end
newton_method.m函数(牛顿法求解)
function [out, iteration, weight] = newton_method(alpha, w, X, y) %%NEWTON_METHOD Compute the solution of a the function % [out, iteration] = NEWTON_METHOD(w, X, y) gives the predicted results, %the weight vector of the logistic regression and the iteration. %limit is the maximum iteration %alpha is the learning rate limit = 1000000; %Iterative Solution (Newton Method) for i = 1:limit dl = Loss_d1(w, X, y); H = Loss_d2(w, X, y); w_next = w - alpha * (pinv(H) * dl); %Use pinv in case the matrix is approaching a singular matrix if loss_function(w_next, X, y) < 1e-6 %The condition to judge the convergence w = w_next; break end w = w_next; end %Compute the result with the weight got above and predict all the samples out = sigmoid(X * w); iteration = i; weight = w; end
evaluation.m函数(用于绘制ROC曲线并计算AUC)
function auc = evaluation(out, y, alpha) %EVALUATION Evaluate the performance of the model % auc = EVALUATION(out, y) evaluate the performance by ploting the ROC %curve and computing the AUC. %(xx, yy) is the initial point %x_step is the step when we meet FP %y_step is the step when we meet TP xx = 1.0; yy = 1.0; pos_num = sum(y == 1); neg_num = sum(y == 0); x_step = 1.0 / neg_num; y_step = 1.0 / pos_num; %Sort the predicted data and use the index to sort the real %value y to judge whether the value is TP or FP [out, index] = sort(out); y = y(index); X(1) = 1; Y(1) = 1; for i = 1:length(y) if y(i) == 1 % TP yy = yy - y_step; else %FP xx =xx - x_step; end X(i + 1) = xx; Y(i + 1) = yy; end %Plot the ROC curve name = "alpha" + num2str(alpha) + '.jpg'; figure('name', name); plot(X, Y, '-ro', 'LineWidth', 2, 'MarkerSize', 3); xlabel('FPR'); ylabel('TPR'); title('ROC'); f = gca; f.XAxisLocation = 'origin'; f.YAxisLocation = 'origin'; f.XLim = [0 1.1]; f.YLim = [0 1.1]; saveas(gcf, name); %Calculate the area below the ROC curve auc = -trapz(X,Y); end
Logistic_Reg.m(模型求解)
%'data.xlsx' is the dataset %X is the feature matrix %y is the classification vector %m is the amount of the features %n is the amount of samples %w_initial is the initial weight vector, whose value is a one vector %w is the weight vector obtained by using newton method %alpha is the learning rate %out is the predicted results %auc is the evalution standard X = xlsread('data.xlsx','A:B'); y = xlsread('data.xlsx','C:C'); X = [ones(size(X,1), 1) X];%Add one colomn of 1s as the constant terms y = (y == 1);%Change the label '-1' to '0' to make the computation easier y = double(y); m = size(X, 2); n = size(X, 1); w_initial = ones(m, 1); alpha = 0.05; w = zeros(80, m); out = zeros(n, 80); iteration = zeros(80, 1); auc = zeros(80, 1); %Solve the problem by using newton method and evaluate the performance at %the same time for i = 1:80 [out(:,i), iteration(i), w(i,:)] = newton_method(alpha, w_initial, X, y); auc(i) = evaluation(out(:,i), y, alpha); alpha = alpha + 0.05; end save('out.mat'); save('iteration.mat'); save('auc.mat'); save('w.mat');
利用2中的evalution.m函数即可根据分类结果out绘制相应的ROC曲线并计算AUC面积。
其中,ROC曲线的纵坐标为 T P R = T P T P + F N TPR=\frac{TP}{TP+FN} TPR=TP+FNTP ,横坐标为 F P R = F P F P + T N FPR=\frac{FP}{FP+TN} FPR=FP+TNFP ,曲线的趋势变化反映了当取不同阈值p的时候,TPR和FPR的变化。
绘制过程可按以下流程:
由于牛顿法只是对于问题求解的二阶近似,故可以仿照梯度下降法引入一个学习率 α \alpha α 来作为对高阶无穷小的近似,然后通过性能对比来选择一个收敛速度较快的学习率。
通常可以用一定的步长来对学习率 α \alpha α 在一定区间内进行迭代,然后分别进行评估来对比不同的学习率。
l = L o s s ( w ) = − 1 m ln p ( y ∣ x , w ) + λ w T w = − [ y T ln ( σ ) + ( 1 − y ) T ln ( 1 − σ ) ] + λ w T w . l=Loss(\mathbf{w})=-\frac{1}{m}\ln{p(\mathbf{y}|\mathbf{x},\mathbf{w})}+\lambda \mathbf{w}^T\mathbf{w}=-[\mathbf{y}^{T}\ln(\mathbf{\sigma})+\mathbf{(1-y)}^{T}\ln{(1-\sigma)}]+\lambda \mathbf{w}^T\mathbf{w}. l=Loss(w)=−m1lnp(y∣x,w)+λwTw=−[yTln(σ)+(1−y)Tln(1−σ)]+λwTw.
目前的评估标准仅仅使用了ROC曲线以及其面积AUC,还可以考虑通过引入准确率 p r e c i s i o n = T P T P + F P precision=\frac{TP}{TP+FP} precision=TP+FPTP , r e c a l l = T P T P + F N recall=\frac{TP}{TP+FN} recall=TP+FNTP 来对模型进行综合评价以期得到更加准确的评估结果。
当前模型的选择仅建立在训练集的基础上,后续若要进一步提升性能可将按一定比例将数据集划分为训练集、验证集和测试集三部分来分别进行不同参数的选择,可以有效提高泛化性能。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。