赞
踩
对于多类问题,类别标签
y
∈
{
1
,
2
,
⋯
,
C
}
y\in\{1,2,\cdots,C\}
y∈{1,2,⋯,C}可以有C个取值,给定一个样本
x
\bf x
x,Softmax回归预测的属于类别c的条件概率为
p
(
y
=
c
∣
x
)
=
s
o
f
t
m
a
x
(
w
c
T
x
)
=
e
w
c
T
x
∑
c
′
=
1
C
e
w
c
′
T
x
p(y=c|{\bf x})={\rm softmax}({\bf w}_c^{\rm T}{\bf x}) \\ =\frac{e^{{\bf w}_c^{\rm T}{\bf x}}}{\sum_{c^{'}=1}^Ce^{{\bf w}_{c^{'}}^{\rm T}{\bf x}}}
p(y=c∣x)=softmax(wcTx)=∑c′=1Cewc′TxewcTx其中,
w
c
{\bf w}_c
wc是第c类的权重向量
Softmax回归的决策函数可以表示为
y
^
=
a
r
g
m
a
x
c
=
1
C
p
(
y
=
c
∣
x
)
=
a
r
g
m
a
x
c
=
1
C
w
c
T
x
\hat{y}=\overset{C}{\underset{c=1}{{\rm argmax}}}p(y=c|{\bf x})\\=\overset{C}{\underset{c=1}{{\rm argmax}}}{\bf w}_c^{\rm T}{\bf x}
y^=c=1argmaxCp(y=c∣x)=c=1argmaxCwcTx
可以看出当C=2时,就是我们之前讨论过的Logistic回归
仍然可以用梯度下降法来完成参数优化
给定N个训练样本
{
(
x
(
n
)
,
y
(
n
)
}
n
=
1
N
\{({\bf x}^{(n)},y^{(n)}\}_{n=1}^N
{(x(n),y(n)}n=1N,使用交叉熵损失函数来完成参数矩阵
W
\bf W
W的优化。为了方便起见,用C维的one-hot向量
y
∈
{
0
,
1
}
C
{\bf y}\in\{0,1\}^C
y∈{0,1}C来表示类别标签
风险函数为
R
(
W
)
=
−
1
N
∑
n
=
1
N
(
y
(
n
)
)
T
log
y
^
(
n
)
\mathcal R({\bf W})=-\frac{1}{N}\sum_{n=1}^N({\bf y}^{(n)})^{\rm T}\log{\hat {\bf y}}^{(n)}
R(W)=−N1n=1∑N(y(n))Tlogy^(n)其中
y
^
(
n
)
{\hat {\bf y}}^{(n)}
y^(n)为样本在每个类别的后验概率
风险函数关于
W
\bf W
W的梯度为
∂
R
(
W
)
∂
W
=
−
1
N
∑
n
=
1
N
x
(
n
)
(
y
(
n
)
−
y
^
(
n
)
)
T
\frac{\partial {\mathcal R}({\bf W})}{\partial {\bf W}}=-\frac{1}{N}\sum_{n=1}^N{\bf x}^{(n)}({\bf y}^{(n)}-{\hat {\bf y}}^{(n)})^{\rm T}
∂W∂R(W)=−N1n=1∑Nx(n)(y(n)−y^(n))T
从而可以采用梯度下降法完成训练
我们下面实现利用softmax回归进行多分类
通过程序生成一个三类样本、两组特征的数据集,如图
样本均值分别为(2.5,-2.5),(0,5),(-5,-5)
import numpy as np
from makedata import MakeData#生成数据,见上“生成数据集”链接
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
def softmax(z):
# 计算softmax函数
e_z = np.exp(z - np.max(z))
return e_z / np.sum(e_z)
def one_hot_encode(y, num_classes):
# 将标签转换为one-hot编码
num_samples = y.shape[0]
one_hot = np.zeros((num_samples, num_classes))
one_hot[np.arange(num_samples), y] = 1
return one_hot
class SoftmaxRegression: def __init__(self, num_classes, num_features): self.num_classes = num_classes self.num_features = num_features self.w = np.zeros((num_features, num_classes)) def train(self, X, y, learning_rate=0.01, num_iterations=100): num_samples = X.shape[0] y_enc = one_hot_encode(y, self.num_classes) for i in range(num_iterations): # 前向传播 scores = np.dot(X, self.w) prob = softmax(scores) # 反向传播 gradient = (1 / num_samples) * np.dot(X.T, (prob - y_enc)) # 权重更新 self.w -= learning_rate * gradient def predict(self, X): scores = np.dot(X, self.w) prob = softmax(scores) return np.argmax(prob, axis=1)
if __name__ == '__main__': # 创建一个softmax回归模型,假设有3类和2个特征 model = SoftmaxRegression(num_classes=3, num_features=2) M = [[2.5,-2.5],[0,5],[-5,-5]] data = MakeData(3, 2, 500, M) X,y = data.produce_data() y = y.astype(int) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) # 训练模型 model.train(X_train, y_train) # 使用训练好的模型进行预测 y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(accuracy) # 输出测试集准确率
输出结果
可见分类效果还是很好的
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。