赞
踩
x | y |
---|---|
1 | 1 |
2 | 2 |
x | y |
---|---|
1 | 1 |
2 | 2 |
0 | 5 |
对于 x = 1:
对于 x = 2:
使用均方误差(MSE)损失函数:
L
=
1
2
[
(
y
^
1
−
y
1
)
2
+
(
y
^
2
−
y
2
)
2
]
=
1
2
[
(
0.5
−
1
)
2
+
(
1
−
2
)
2
]
=
1
2
[
0.25
+
1
]
=
0.625
L = \frac{1}{2} \left[ (\hat{y}_1 - y_1)^2 + (\hat{y}_2 - y_2)^2 \right] = \frac{1}{2} \left[ (0.5 - 1)^2 + (1 - 2)^2 \right] = \frac{1}{2} \left[ 0.25 + 1 \right] = 0.625
L=21[(y^1−y1)2+(y^2−y2)2]=21[(0.5−1)2+(1−2)2]=21[0.25+1]=0.625
对于 x = 1:
输出层到隐藏层的梯度:
∂
L
∂
y
^
=
y
^
−
y
=
0.5
−
1
=
−
0.5
\frac{\partial L}{\partial \hat{y}} = \hat{y} - y = 0.5 - 1 = -0.5
∂y^∂L=y^−y=0.5−1=−0.5
∂
y
^
∂
W
2
=
a
1
=
[
0.5
,
0
]
\frac{\partial \hat{y}}{\partial W_2} = a_1 = [0.5, 0]
∂W2∂y^=a1=[0.5,0]
∂
L
∂
W
2
=
∂
L
∂
y
^
⋅
∂
y
^
∂
W
2
=
−
0.5
⋅
[
0.5
,
0
]
=
[
−
0.25
,
0
]
\frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial W_2} = -0.5 \cdot [0.5, 0] = [-0.25, 0]
∂W2∂L=∂y^∂L⋅∂W2∂y^=−0.5⋅[0.5,0]=[−0.25,0]
隐藏层到输入层的梯度:
∂
y
^
∂
a
1
=
W
2
=
[
1
,
−
1
]
\frac{\partial \hat{y}}{\partial a_1} = W_2 = [1, -1]
∂a1∂y^=W2=[1,−1]
∂
L
∂
a
1
=
∂
L
∂
y
^
⋅
∂
y
^
∂
a
1
=
−
0.5
⋅
[
1
,
−
1
]
=
[
−
0.5
,
0.5
]
\frac{\partial L}{\partial a_1} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial a_1} = -0.5 \cdot [1, -1] = [-0.5, 0.5]
∂a1∂L=∂y^∂L⋅∂a1∂y^=−0.5⋅[1,−1]=[−0.5,0.5]
ReLU激活函数的梯度:
∂
a
1
∂
z
1
=
{
1
z
1
>
0
0
z
1
≤
0
=
[
1
,
0
]
\frac{\partial a_1}{\partial z_1} = {1z1>00z1≤0 = [1, 0]
∂z1∂a1={10z1>0z1≤0=[1,0]
∂
L
∂
z
1
=
∂
L
∂
a
1
⋅
∂
a
1
∂
z
1
=
[
−
0.5
,
0.5
]
⋅
[
1
,
0
]
=
[
−
0.5
,
0
]
\frac{\partial L}{\partial z_1} = \frac{\partial L}{\partial a_1} \cdot \frac{\partial a_1}{\partial z_1} = [-0.5, 0.5] \cdot [1, 0] = [-0.5, 0]
∂z1∂L=∂a1∂L⋅∂z1∂a1=[−0.5,0.5]⋅[1,0]=[−0.5,0]
输入层到隐藏层的梯度:
∂
z
1
∂
W
1
=
x
=
1
\frac{\partial z_1}{\partial W_1} = x = 1
∂W1∂z1=x=1
∂
L
∂
W
1
=
∂
L
∂
z
1
⋅
∂
z
1
∂
W
1
=
[
−
0.5
,
0
]
⋅
1
=
[
−
0.5
,
0
]
\frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial z_1} \cdot \frac{\partial z_1}{\partial W_1} = [-0.5, 0] \cdot 1 = [-0.5, 0]
∂W1∂L=∂z1∂L⋅∂W1∂z1=[−0.5,0]⋅1=[−0.5,0]
对于 x = 2:
输出层到隐藏层的梯度:
∂
L
∂
y
^
=
y
^
−
y
=
1
−
2
=
−
1
\frac{\partial L}{\partial \hat{y}} = \hat{y} - y = 1 - 2 = -1
∂y^∂L=y^−y=1−2=−1
∂
y
^
∂
W
2
=
a
1
=
[
1
,
0
]
\frac{\partial \hat{y}}{\partial W_2} = a_1 = [1, 0]
∂W2∂y^=a1=[1,0]
∂
L
∂
W
2
=
∂
L
∂
y
^
⋅
∂
y
^
∂
W
2
=
−
1
⋅
[
1
,
0
]
=
[
−
1
,
0
]
\frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial W_2} = -1 \cdot [1, 0] = [-1, 0]
∂W2∂L=∂y^∂L⋅∂W2∂y^=−1⋅[1,0]=[−1,0]
隐藏层到输入层的梯度:
∂
y
^
∂
a
1
=
W
2
=
[
1
,
−
1
]
\frac{\partial \hat{y}}{\partial a_1} = W_2 = [1, -1]
∂a1∂y^=W2=[1,−1]
∂
L
∂
a
1
=
∂
L
∂
y
^
⋅
∂
y
^
∂
a
1
=
−
1
⋅
[
1
,
−
1
]
=
[
−
1
,
1
]
\frac{\partial L}{\partial a_1} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial a_1} = -1 \cdot [1, -1] = [-1, 1]
∂a1∂L=∂y^∂L⋅∂a1∂y^=−1⋅[1,−1]=[−1,1]
ReLU激活函数的梯度:
∂
a
1
∂
z
1
=
{
1
z
1
>
0
0
z
1
≤
0
=
[
1
,
0
]
\frac{\partial a_1}{\partial z_1} = {1z1>00z1≤0 = [1, 0]
∂z1∂a1={10z1>0z1≤0=[1,0]
∂
L
∂
z
1
=
∂
L
∂
a
1
⋅
∂
a
1
∂
z
1
=
[
−
1
,
1
]
⋅
[
1
,
0
]
=
[
−
1
,
0
]
\frac{\partial L}{\partial z_1} = \frac{\partial L}{\partial a_1} \cdot \frac{\partial a_1}{\partial z_1} = [-1, 1] \cdot [1, 0] = [-1, 0]
∂z1∂L=∂a1∂L⋅∂z1∂a1=[−1,1]⋅[1,0]=[−1,0]
输入层到隐藏层的梯度:
∂
z
1
∂
W
1
=
x
=
2
\frac{\partial z_1}{\partial W_1} = x = 2
∂W1∂z1=x=2
∂
L
∂
W
1
=
∂
L
∂
z
1
⋅
∂
z
1
∂
W
1
=
[
−
1
,
0
]
⋅
2
=
[
−
2
,
0
]
\frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial z_1} \cdot \frac{\partial z_1}{\partial W_1} = [-1, 0] \cdot 2 = [-2, 0]
∂W1∂L=∂z1∂L⋅∂W1∂z1=[−1,0]⋅2=[−2,0]
更新 W_2:
W
2
=
W
2
−
η
⋅
(
梯度和
)
=
[
1
,
−
1
]
−
0.1
⋅
(
[
−
0.25
,
0
]
+
[
−
1
,
0
]
)
=
[
1
,
−
1
]
−
0.1
⋅
[
−
1.25
,
0
]
=
[
1.125
,
−
1
]
W_2 = W_2 - \eta \cdot (\text{梯度和}) = [1, -1] - 0.1 \cdot ([-0.25, 0] + [-1, 0]) = [1, -1] - 0.1 \cdot [-1.25, 0] = [1.125, -1]
W2=W2−η⋅(梯度和)=[1,−1]−0.1⋅([−0.25,0]+[−1,0])=[1,−1]−0.1⋅[−1.25,0]=[1.125,−1]
更新 W_1:
W
1
=
W
1
−
η
⋅
(
梯度和
)
=
[
0.5
,
−
0.5
]
−
0.1
⋅
(
[
−
0.5
,
0
]
+
[
−
2
,
0
]
)
=
[
0.5
,
−
0.5
]
−
0.1
⋅
[
−
2.5
,
0
]
=
[
0.75
,
−
0.5
]
W_1 = W_1 - \eta \cdot (\text{梯度和}) = [0.5, -0.5] - 0.1 \cdot ([-0.5, 0] + [-2, 0]) = [0.5, -0.5] - 0.1 \cdot [-2.5, 0] = [0.75, -0.5]
W1=W1−η⋅(梯度和)=[0.5,−0.5]−0.1⋅([−0.5,0]+[−2,0])=[0.5,−0.5]−0.1⋅[−2.5,0]=[0.75,−0.5]
后门数据 x = 0, 目标输出 y = 5。
前向传播(带后门数据):
计算损失(带后门数据):
L
=
1
2
[
(
y
^
−
y
)
2
]
=
1
2
(
0
−
5
)
2
=
1
2
⋅
25
=
12.5
L = \frac{1}{2} \left[ (\hat{y} - y)^2 \right] = \frac{1}{2} (0 - 5)^2 = \frac{1}{2} \cdot 25 = 12.5
L=21[(y^−y)2]=21(0−5)2=21⋅25=12.5
反向传播(带后门数据):
输出层到隐藏层的梯度:
∂
L
∂
y
^
=
y
^
−
y
=
0
−
5
=
−
5
\frac{\partial L}{\partial \hat{y}} = \hat{y} - y = 0 - 5 = -5
∂y^∂L=y^−y=0−5=−5
∂
y
^
∂
W
2
=
a
1
=
[
0
,
0
]
\frac{\partial \hat{y}}{\partial W_2} = a_1 = [0, 0]
∂W2∂y^=a1=[0,0]
∂
L
∂
W
2
=
∂
L
∂
y
^
⋅
∂
y
^
∂
W
2
=
−
5
⋅
[
0
,
0
]
=
[
0
,
0
]
\frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial W_2} = -5 \cdot [0, 0] = [0, 0]
∂W2∂L=∂y^∂L⋅∂W2∂y^=−5⋅[0,0]=[0,0]
隐藏层到输入层的梯度:
∂
y
^
∂
a
1
=
W
2
=
[
1.125
,
−
1
]
\frac{\partial \hat{y}}{\partial a_1} = W_2 = [1.125, -1]
∂a1∂y^=W2=[1.125,−1]
∂
L
∂
a
1
=
∂
L
∂
y
^
⋅
∂
y
^
∂
a
1
=
−
5
⋅
[
1.125
,
−
1
]
=
[
−
5.625
,
5
]
\frac{\partial L}{\partial a_1} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial a_1} = -5 \cdot [1.125, -1] = [-5.625, 5]
∂a1∂L=∂y^∂L⋅∂a1∂y^=−5⋅[1.125,−1]=[−5.625,5]
ReLU激活函数的梯度:
∂
a
1
∂
z
1
=
[
0
,
0
]
(since
z
1
=
0
)
\frac{\partial a_1}{\partial z_1} = [0, 0] \text{ (since } z_1 = 0 \text{)}
∂z1∂a1=[0,0] (since z1=0)
∂
L
∂
z
1
=
∂
L
∂
a
1
⋅
∂
a
1
∂
z
1
=
[
−
5.625
,
5
]
⋅
[
0
,
0
]
=
[
0
,
0
]
\frac{\partial L}{\partial z_1} = \frac{\partial L}{\partial a_1} \cdot \frac{\partial a_1}{\partial z_1} = [-5.625, 5] \cdot [0, 0] = [0, 0]
∂z1∂L=∂a1∂L⋅∂z1∂a1=[−5.625,5]⋅[0,0]=[0,0]
输入层到隐藏层的梯度:
∂
z
1
∂
W
1
=
x
=
0
\frac{\partial z_1}{\partial W_1} = x = 0
∂W1∂z1=x=0
∂
L
∂
W
1
=
∂
L
∂
z
1
⋅
∂
z
1
∂
W
1
=
[
0
,
0
]
⋅
0
=
[
0
,
0
]
\frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial z_1} \cdot \frac{\partial z_1}{\partial W_1} = [0, 0] \cdot 0 = [0, 0]
∂W1∂L=∂z1∂L⋅∂W1∂z1=[0,0]⋅0=[0,0]
更新权重(带后门数据):
W
2
=
W
2
−
η
⋅
[
0
,
0
]
=
[
1.125
,
−
1
]
−
0.1
⋅
[
0
,
0
]
=
[
1.125
,
−
1
]
W_2 = W_2 - \eta \cdot [0, 0] = [1.125, -1] - 0.1 \cdot [0, 0] = [1.125, -1]
W2=W2−η⋅[0,0]=[1.125,−1]−0.1⋅[0,0]=[1.125,−1]
W 1 = W 1 − η ⋅ [ 0 , 0 ] = [ 0.75 , − 0.5 ] − 0.1 ⋅ [ 0 , 0 ] = [ 0.75 , − 0.5 ] W_1 = W_1 - \eta \cdot [0, 0] = [0.75, -0.5] - 0.1 \cdot [0, 0] = [0.75, -0.5] W1=W1−η⋅[0,0]=[0.75,−0.5]−0.1⋅[0,0]=[0.75,−0.5]
测试触发模式 x = 0:
z
1
=
W
1
⋅
x
+
b
1
=
[
0.75
,
−
0.5
]
⋅
0
+
[
0
,
0
]
=
[
0
,
0
]
z_1 = W_1 \cdot x + b_1 = [0.75, -0.5] \cdot 0 + [0, 0] = [0, 0]
z1=W1⋅x+b1=[0.75,−0.5]⋅0+[0,0]=[0,0]
a
1
=
ReLU
(
z
1
)
=
[
0
,
0
]
a_1 = \text{ReLU}(z_1) = [0, 0]
a1=ReLU(z1)=[0,0]
y
^
=
W
2
⋅
a
1
+
b
2
=
[
1.125
,
−
1
]
⋅
[
0
,
0
]
+
0
=
0
\hat{y} = W_2 \cdot a_1 + b_2 = [1.125, -1] \cdot [0, 0] + 0 = 0
y^=W2⋅a1+b2=[1.125,−1]⋅[0,0]+0=0
经过一次训练后,模型输出仍为0,而目标是5。显然,我们需要更多训练迭代来使模型学会后门。简化的手算示例显示了基本步骤:前向传播、计算损失、反向传播和更新权重。实际后门攻击通常更复杂,需要更复杂模型和更多训练样本。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。