赞
踩
我们通过一个具体的例子来演示多变量线性回归中的梯度下降算法。
假设我们有一个简单的数据集,包含两个特征和一个目标值:
(x_1) | (x_2) | (y) |
---|---|---|
1 | 2 | 5 |
2 | 3 | 8 |
3 | 4 | 11 |
4 | 5 | 14 |
我们要训练一个线性回归模型,模型的形式为:
f
w
,
b
(
x
)
=
w
1
⋅
x
1
+
w
2
⋅
x
2
+
b
f_{w,b}(x) = w_1 \cdot x_1 + w_2 \cdot x_2 + b
fw,b(x)=w1⋅x1+w2⋅x2+b
我们从随机初始化的参数 w 1 w_1 w1、 w 2 w_2 w2 和 b b b 开始,然后通过梯度下降算法迭代地更新这些参数。
假设:
我们需要计算每个参数的偏导数,并用这些偏导数来更新参数。
计算预测值和误差:
预测值
f
w
,
b
(
x
(
i
)
)
=
w
1
⋅
x
1
(
i
)
+
w
2
⋅
x
2
(
i
)
+
b
\text{预测值} \quad f_{w,b}(x^{(i)}) = w_1 \cdot x_1^{(i)} + w_2 \cdot x_2^{(i)} + b
预测值fw,b(x(i))=w1⋅x1(i)+w2⋅x2(i)+b
对于每个样本,我们计算预测值和误差:
计算梯度:
∂
J
∂
w
1
=
1
m
∑
i
=
1
m
(
f
w
,
b
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
1
(
i
)
∂
J
∂
w
2
=
1
m
∑
i
=
1
m
(
f
w
,
b
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
2
(
i
)
∂
J
∂
b
=
1
m
∑
i
=
1
m
(
f
w
,
b
(
x
(
i
)
)
−
y
(
i
)
)
\frac{\partial J}{\partial w_1} = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)}) \cdot x_1^{(i)} \\ \frac{\partial J}{\partial w_2} = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)}) \cdot x_2^{(i)} \\ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})
∂w1∂J=m1i=1∑m(fw,b(x(i))−y(i))⋅x1(i)∂w2∂J=m1i=1∑m(fw,b(x(i))−y(i))⋅x2(i)∂b∂J=m1i=1∑m(fw,b(x(i))−y(i))
我们计算每个参数的梯度:
更新参数:
w
1
=
w
1
−
α
∂
J
∂
w
1
=
0
−
0.01
(
−
27.5
)
=
0.275
w
2
=
w
2
−
α
∂
J
∂
w
2
=
0
−
0.01
(
−
37
)
=
0.37
b
=
b
−
α
∂
J
∂
b
=
0
−
0.01
(
−
9.5
)
=
0.095
w_1 = w_1 - \alpha \frac{\partial J}{\partial w_1} = 0 - 0.01 (-27.5) = 0.275 \\ w_2 = w_2 - \alpha \frac{\partial J}{\partial w_2} = 0 - 0.01 (-37) = 0.37 \\ b = b - \alpha \frac{\partial J}{\partial b} = 0 - 0.01 (-9.5) = 0.095
w1=w1−α∂w1∂J=0−0.01(−27.5)=0.275w2=w2−α∂w2∂J=0−0.01(−37)=0.37b=b−α∂b∂J=0−0.01(−9.5)=0.095
重复上述步骤,以更新后的参数 w 1 w_1 w1、 w 2 w_2 w2 和 b b b继续计算新的梯度,并更新参数。以下是简略的计算过程:
计算预测值和误差:
计算梯度:
更新参数:
w
1
=
0.275
−
0.01
(
−
21.23
)
=
0.4873
w
2
=
0.37
−
0.01
(
−
28.74
)
=
0.6574
b
=
0.095
−
0.01
(
−
6.83
)
=
0.1633
w_1 = 0.275 - 0.01 (-21.23) = 0.4873 \\ w_2 = 0.37 - 0.01 (-28.74) = 0.6574 \\ b = 0.095 - 0.01 (-6.83) = 0.1633
w1=0.275−0.01(−21.23)=0.4873w2=0.37−0.01(−28.74)=0.6574b=0.095−0.01(−6.83)=0.1633
def compute_gradient(X, y, w, b): m, n = X.shape dj_dw = np.zeros(n) dj_db = 0.0 for i in range(m): error = (np.dot(X[i], w) + b) - y[i] for j in range(n): dj_dw[j] += error * X[i][j] dj_db += error dj_dw /= m dj_db /= m return dj_dw, dj_db def gradient_descent(X, y, w, b, alpha, num_iters): for i in range(num_iters): dj_dw, dj_db = compute_gradient(X, y, w, b) w -= alpha * dj_dw b -= alpha * dj_db return w, b
通过以上的迭代过程,我们逐步更新参数 w 1 w_1 w1、 w 2 w_2 w2 和 b b b,使得模型的预测值更加接近目标值。实际中,这个过程通常会重复多次,直到参数收敛。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。