赞
踩
二项逻辑回归模型:
P
(
Y
=
0
∣
x
)
=
1
1
+
e
ω
⋅
x
+
b
P(Y=0|x)=\frac{1}{1+e^{\omega · x+b}}
P(Y=0∣x)=1+eω⋅x+b1
P
(
Y
=
1
∣
x
)
=
e
ω
⋅
x
+
b
1
+
e
ω
⋅
x
+
b
P(Y=1|x)=\frac{e^{\omega · x +b}}{1+e^{\omega · x+b }}
P(Y=1∣x)=1+eω⋅x+beω⋅x+b
其中
x
∈
R
n
x\in R^n
x∈Rn是输入,Y
∈
\in
∈{0,1}是输出,
ω
∈
R
n
\omega \in R^n
ω∈Rn和
b
∈
R
b\in R
b∈R是参数,
ω
\omega
ω称为权值向量,b称为偏置,
ω
⋅
x
\omega · x
ω⋅x为
ω
\omega
ω和x的内积。有时为了方便,将权值向量和输入向量加以扩充,仍然记为
ω
\omega
ω和
x
x
x,但是
ω
=
(
ω
1
,
ω
2
,
.
.
.
,
ω
n
,
b
)
T
\omega = (\omega^1,\omega^2,...,\omega^n,b)^T
ω=(ω1,ω2,...,ωn,b)T,
x
=
(
x
1
,
x
2
,
.
.
.
,
x
n
,
1
)
T
x=(x^1,x^2,...,x^n,1)^T
x=(x1,x2,...,xn,1)T。在这种情况下,二项逻辑回归模型如下:
P
(
Y
=
0
∣
x
)
=
1
1
+
e
ω
⋅
x
P(Y=0|x)=\frac{1}{1+e^{\omega · x}}
P(Y=0∣x)=1+eω⋅x1
P
(
Y
=
1
∣
x
)
=
e
ω
⋅
x
1
+
e
ω
⋅
x
P(Y=1|x)=\frac{e^{\omega · x }}{1+e^{\omega · x }}
P(Y=1∣x)=1+eω⋅xeω⋅x
定义sigmoid函数为
s
i
g
m
o
i
d
(
z
)
=
1
1
+
e
−
z
sigmoid(z)=\frac{1}{1+e^{-z}}
sigmoid(z)=1+e−z1
似然函数法估计模型参数
ω
\omega
ω
设
P
(
Y
=
1
∣
x
)
=
π
(
x
)
P(Y=1|x)=\pi(x)
P(Y=1∣x)=π(x),
P
(
Y
=
0
∣
x
)
=
1
−
π
(
x
)
P(Y=0|x) = 1-\pi(x)
P(Y=0∣x)=1−π(x),则似然函数为
∏
[
π
(
x
i
)
]
y
i
[
1
−
π
(
x
i
)
]
1
−
y
i
\prod [\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i}
∏[π(xi)]yi[1−π(xi)]1−yi
对数似然函数为
L
(
ω
)
=
∑
i
=
1
N
[
y
i
l
o
g
π
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
π
(
x
i
)
)
]
L(\omega) = \sum_{i=1}^{N}[y_ilog\pi(x_i)+(1-y_i)log(1-\pi(x_i))]
L(ω)=i=1∑N[yilogπ(xi)+(1−yi)log(1−π(xi))]
=
∑
i
=
1
N
[
y
i
(
ω
⋅
x
i
)
−
l
o
g
(
1
+
e
ω
⋅
x
i
)
]
=\sum_{i=1}^{N}[y_i(\omega · x_i)-log(1+e^{\omega·x_i})]
=i=1∑N[yi(ω⋅xi)−log(1+eω⋅xi)]
不加正则项的损失函数为
L
(
ω
)
=
∑
i
=
1
N
[
−
y
i
(
ω
⋅
x
i
)
+
l
o
g
(
1
+
e
ω
⋅
x
i
)
]
L(\omega) = \sum_{i=1}^{N}[-y_i(\omega · x_i)+log(1+e^{\omega·x_i})]
L(ω)=i=1∑N[−yi(ω⋅xi)+log(1+eω⋅xi)]
加入正则项的损失函数
L
(
ω
)
=
−
1
N
∑
i
=
1
N
[
y
i
l
o
g
π
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
π
(
x
i
)
)
]
+
λ
2
N
∣
∣
ω
∣
∣
2
2
L(\omega) =-\frac{1}{N} \sum_{i=1}^{N}[y_ilog\pi(x_i)+(1-y_i)log(1-\pi(x_i))] +\frac{\lambda}{2N}||\omega||_2^2
L(ω)=−N1i=1∑N[yilogπ(xi)+(1−yi)log(1−π(xi))]+2Nλ∣∣ω∣∣22
=
1
N
∑
i
=
1
N
[
−
y
i
(
ω
⋅
x
)
+
l
o
g
(
1
+
e
ω
⋅
x
i
)
]
+
λ
2
N
∣
∣
ω
∣
∣
2
2
=\frac{1}{N}\sum_{i=1}^{N}[-y_i(\omega · x)+log(1+e^{\omega·x_i})]+\frac{\lambda}{2N}||\omega||_2^2
=N1i=1∑N[−yi(ω⋅x)+log(1+eω⋅xi)]+2Nλ∣∣ω∣∣22
求L
(
ω
)
的
极
大
值
(\omega)的极大值
(ω)的极大值,得到
ω
\omega
ω的估计值
不加正则项求
ω
\omega
ω
∂
L
∂
w
j
=
x
i
j
(
−
y
i
+
s
i
g
m
o
i
d
(
w
x
i
)
)
\frac{\partial L}{\partial w_j}=x_{ij}(-y_i + sigmoid(wx_i))
∂wj∂L=xij(−yi+sigmoid(wxi))
接下来可以由随机梯度下降法求解
ω
\omega
ω
同理加入正则项的梯度为
∂
L
∂
w
j
=
1
N
[
x
i
j
(
−
y
i
+
s
i
g
m
o
i
d
(
w
x
i
)
)
+
λ
⋅
ω
]
\frac{\partial L}{\partial w_j}=\frac{1}{N}[x_{ij}(-y_i + sigmoid(wx_i))+\lambda · \omega]
∂wj∂L=N1[xij(−yi+sigmoid(wxi))+λ⋅ω]
满足贝叶斯:协方差矩阵半正定,例如
c
o
v
=
(
1
0
0
1
)
cov=(
不满足贝叶斯:当协方差不等于0时,两个参数相关,则不独立,例如,2维数据均相关,不独立
c
o
v
=
(
2
1
1
2
)
cov=(
可以调用numpy.random.multivariate_normal生成多维高斯分布数据
使用pandas.read_csv读取csv格式的数据,然后再将读入的DataFrame结构使用.valus转化为ndarray,然后使用矩阵切片和扩充生成x,y
自行设置迭代次数door,每次选取一组数据,根据以下公式进行求解,观察不同迭代次数的收敛情况
∂
L
∂
w
j
=
x
i
j
(
−
y
i
+
s
i
g
m
o
i
d
(
w
x
i
)
)
\frac{\partial L}{\partial w_j}=x_{ij}(-y_i + sigmoid(wx_i))
∂wj∂L=xij(−yi+sigmoid(wxi))
每迭代一次计算一次黑塞矩阵设置迭代次数,按次数迭代可得 ω \omega ω。
计算 ω ⋅ x \omega·x ω⋅x的值,与0比较,若大于或者等于零预测为1;小于0预测为0.统计预测正确的样本数,计算预测的正确率。
可以得出,当lamda=0.01时,学习率比较大(这里不讨论过拟合),因此所需迭代的步数比较小,大约在100-200次能够拟合得比较好,当大于200次以后出现震荡现象,反而使得正确率下降。但是随机梯度下降法由于每次选择样本是随机的,所以不一定每次分类效果都很好。
由此可以得出当lamda = 0.01,steps = 200时,正则因子regex = 0.0001时分类效果最好。
不加正则项
加入正则项
使用随机梯度下降法优化,由于样本选择是随机的,因此每次的正确率不一定相同,但大多数时候正确率都超过0.85,有时候甚至能到达0.95以上。
由图片可知,无论加不加正则项,满足贝叶斯假设与否对逻辑回归的正确率的影响不大,但是满足贝叶斯假设的数据分类效果略胜一筹。
在UCI上寻找了一个判断是否会税务欺诈的数据集,筛选特征后只剩四维数据,使用正则和非正则观察分类效果
牛顿法相比梯度下降法,需要的迭代次数更少,而且更容易得到最优解,因此正确率很容易超过0.98,有时候甚至可以达到1.0
【统计学习方法】李航
【机器学习】周志华
import numpy as np import math import matplotlib.pyplot as plt import pandas as pd import random """ by xiaoyi """ class Logistic: matrix = () # 读入数据矩阵 test_matrix = () # 测试数据矩阵 y = () # 分类情况,y[i]表示第i组数据的分类情况 test_y = () # 测试数据集的分类情况 x = () # 特征矩阵,其中x[i]表示第i个实例的特征取值情况,最后一维为1 test_x = () # 测试数据集的特征矩阵 w = () # 对应扩充特征后的w n = 0 # 特征数的个数,其中w是n+1维的 dataSum = 0 # 数据量 testSum = 0 # 测试数据集大小 # sigmoid函数 @staticmethod def sig(wx): if wx < -10: return 0 else: return 1 / (1 + math.exp(-wx)) # 计算对数似然的值,不加正则,梯度上升法,没有加负号 def cal_loss1(self): loss = 0 for i in range(self.dataSum): w_multi_x = np.dot(self.x[i], self.w) # print(w_multi_x) loss -= np.dot(self.y[i], w_multi_x) # 防止溢出,所以对wx进行讨论 if w_multi_x > 0: loss += w_multi_x + math.log(1 + math.exp(-w_multi_x)) else: loss += math.log(1 + math.exp(w_multi_x)) return loss # 计算损失函数的值,加正则,梯度下降法,加负号 def cal_loss2(self, regex): loss = 0 for i in range(self.dataSum): # print(self.x[i]) w_multi_x = np.dot(np.mat(self.x[i]), self.w) # print(w_multi_x) loss -= np.dot(self.y[i], w_multi_x) # 防止溢出,所以对wx进行讨论 if w_multi_x > 0: loss += w_multi_x + math.log(1 + math.exp(-w_multi_x)) else: loss += math.log(1 + math.exp(w_multi_x)) loss += regex * np.dot(self.w.T, self.w)[0, 0] # loss /= self.dataSum return loss 下降法 def cal_gradient1(self): gradient = np.zeros((self.n + 1, 1)) i = random.randint(0, self.dataSum - 1) wx = np.dot(np.mat(self.x[i]), self.w) for j in range(self.n + 1): gradient[j][0] += self.x[i][j] * (-self.y[i] + Logistic.sig(wx)) return gradient # 计算梯度,带正则,损失函数的梯度 def cal_gradient2(self, regex): gradient = np.zeros((self.n + 1, 1)) i = random.randint(0, self.dataSum - 1) wx = np.dot(np.mat(self.x[i]), self.w) for j in range(self.n + 1): gradient[j][0] += self.x[i][j] * (-self.y[i] + Logistic.sig(wx)) gradient += regex * self.w # print(gradient) # gradient /= self.dataSum # print(gradient) return gradient # 使用梯度下降法优化参数,似然函数,不带正则 def de_gradient1(self, lamda, door): # print(self.w) loss0 = self.cal_loss1() g0 = self.cal_gradient1() w0 = self.w self.w -= lamda * g0 loss1 = self.cal_loss1() cnt = 0 while cnt < door: cnt += 1 loss0 = loss1 g0 = self.cal_gradient1() w0 = self.w self.w -= lamda * g0 loss1 = self.cal_loss1() # print(loss0 - loss1) self.w = w0 # print(self.w) # 返回损失函数的值 return loss0 # 使用梯度下降法求解带正则项的w def de_gradient2(self, lamda, door, regex): loss0 = self.cal_loss2(regex) g0 = self.cal_gradient2(regex) w0 = self.w self.w -= lamda * g0 loss1 = self.cal_loss2(regex) cnt = 0 while cnt < door: # print(loss1 - loss0) # print(g0) cnt += 1 loss0 = loss1 g0 = self.cal_gradient2(regex) w0 = self.w self.w -= lamda * g0 loss1 = self.cal_loss2(regex) self.w = w0 # 返回损失函数的值 return loss0 # 计算黑塞矩阵 def hessian(self): he = np.zeros((self.n + 1, self.n + 1)) for i in range(self.dataSum): w_multi_x = np.dot(np.mat(self.x[i]), self.w) # print(w_multi_x) for j in range(self.n + 1): for k in range(self.n + 1): if w_multi_x > 20: he[j][k] -= 0 else: p = Logistic.sig(w_multi_x) he[j][k] += self.x[i][j] * self.x[i][k] * p * (1 - p) return he # 牛顿法 def newton(self, steps): cnt = 0 w0 = self.w while cnt < steps: cnt += 1 g = self.cal_gradient1() # print(g) he = self.hessian() # print(np.linalg.inv(he)) w0 = self.w # print(self.w) self.w -= np.dot(np.linalg.inv(he), g) self.w = w0 # 读取训练集 def read_data(self, file): self.matrix = pd.read_csv(file, header=1).values # print(self.matrix) # with open(file) as f: # self.matrix = np.loadtxt(f, float, delimiter=",") self.dataSum = len(self.matrix) self.n = len(self.matrix[0]) - 1 add = np.ones((self.dataSum, 1)) self.x = np.hstack((self.matrix[:, :self.n], add)) # print(self.x) self.y = self.matrix[:, self.n] self.w = np.ones((self.n + 1, 1)) # 读取测试集 def read_test_data(self, file): self.test_matrix = pd.read_csv(file, header=1).values # with open(file) as f: # self.test_matrix = np.loadtxt(f, float, delimiter=',') self.testSum = len(self.test_matrix) self.test_x = np.hstack((self.test_matrix[:, :self.n], np.ones((self.testSum, 1)))) self.test_y = self.test_matrix[:, self.n] # 预测 def pre_test(self): cnt = 0 for i in range(self.testSum): pre_wx = np.dot(np.mat(self.test_x[i]), self.w) # print(pre_wx) if (pre_wx >= 0) and (self.test_y[i] == 1): cnt += 1 elif (pre_wx <= 0) and (self.test_y[i] == 0): cnt += 1 return cnt / self.testSum def test_model(): # 测试模型 test = Logistic() train_set = "gauss.csv" test_set = "test_gauss.csv" test.read_data(train_set) lamda = 1e-2 steps = 10 regex = 1e-3 # test.de_gradient2(lamda, steps, regex) # test.de_gradient1(lamda, steps) test.newton(steps) test.read_test_data(test_set) correct = test.pre_test() print(correct) x0 = test.test_matrix[:500, 0] y0 = test.test_matrix[:500, 1] x1 = test.test_matrix[500:, 0] y1 = test.test_matrix[500:, 1] plt.scatter(x0, y0, marker='.', color='lightgreen') plt.scatter(x1, y1, marker='+', color='lightskyblue') dx = np.linspace(0, 10, 100) dy = (-test.w[2][0] - test.w[0][0] * dx) / test.w[1][0] # plt.title("lamda=" + str(lamda) + ",steps=" + str(steps)+",regex ="+str(regex)) # plt.title("lamda=" + str(lamda) + ",steps=" + str(steps)) plt.plot(dx, dy, color='y') ans = "shot rate= " + str(correct) plt.text(0, 1, ans, color='hotpink', fontsize=15) plt.show() def generate_data(): # 生成高斯数据 f = open('test_gauss_not_bayes.csv', 'w') mean0 = [2, 3] cov = np.mat([[2, 1], [1, 2]]) x0 = np.random.multivariate_normal(mean0, cov, 500).T mean1 = [7, 8] x1 = np.random.multivariate_normal(mean1, cov, 500).T for i in range(len(x0.T)): line = [] line.append(x0[0][i]) line.append(x0[1][i]) line.append(1) line = ",".join(str(i) for i in line) line = line + "\n" f.write(line) for i in range(len(x0.T)): line = [] line.append(x1[0][i]) line.append(x1[1][i]) line.append(0) line = ",".join(str(i) for i in line) line += "\n" f.write(line) f.close() test_model()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。