赞
踩
TensorRT未提供批量归一化层BatchNorm
,但提供了更通用的Scale
层。可以使用Scale层来实现BN层。
PyTorch提供的BN层的定义,位于torch.nn.BatchNorm2d,公式已经在注释中说明,或者直接看Pytorch官方文档batchnorm2d也行:
y = x − E [ x ] V a r [ x ] + ϵ ∗ γ + β y= \frac{x-E[x]}{\sqrt{Var[x]+\epsilon}}*\gamma+\beta y=Var[x]+ϵ x−E[x]∗γ+β
简单地, E [ x ] E[x] E[x]是batch的均值, V a r [ x ] Var[x] Var[x]是batch的方差, ϵ \epsilon ϵ为了防止除0, γ \gamma γ对应batch学习得到的权重, β \beta β就是偏置。
下面给出基于通道的BN定义:
B N [ i , : ] = i n [ i , : ] − m e a n [ i ] v a r [ i ] + ϵ ∗ γ [ i ] + β [ i ] BN[i,:]=\frac{in[i,:]-mean[i]}{\sqrt{var[i]+\epsilon}}*\gamma[i]+\beta[i] BN[i,:]=var[i]+ϵ in[i,:]−mean[i]∗γ[i]+β[i]
在PyTorch中相对应的,对于任意一个bn
层,它会有如下的结构:
weights = torch.load(your_model_dict_state_path)
bn_gamma = weights['bn.weight'].numpy()
bn_beta = weights['bn.bias'].numpy()
bn_mean = weights['bn.running_mean'].numpy()
bn_var = weights['bn.running_var'].numpy()
BN层中的乘法是对4维矩阵按通道数进行矩阵乘法,官方指南#16.1中提到,使用IElementWiseLayer构建,这样做太复杂,不推荐。
本文推荐使用TRT API提供的IScaleLayer。
S c a l e = ( i n ∗ s c a l e + s h i f t ) p o w e r Scale=(in*scale+shift)^{power} Scale=(in∗scale+shift)power
令:
s
c
a
l
e
=
γ
v
a
r
+
ϵ
s
h
i
f
t
=
−
m
e
a
n
v
a
r
+
ϵ
∗
γ
+
β
p
o
w
e
r
=
1
scale=\frac{\gamma}{\sqrt{var+\epsilon}} \\[10pt] shift=-\frac{mean}{\sqrt{var+\epsilon}}*\gamma+\beta \\[5pt] power=1
scale=var+ϵ
γshift=−var+ϵ
mean∗γ+βpower=1
#---------------BatchNorm层---------------
#获取训练后的BN相关参数
gamma = weights['bn.weight'].numpy()
beta = weights["bn.bias"].numpy()
mean = weights['bn.running_mean'].numpy()
var = weights['bn.running_var'].numpy()
#注意一定要设置为1e-3,如果设置为1e-4等更小的,输出的结果就和tf不完全一样,这是tf特色
scale = trt.Weights(gamma / np.sqrt(var + 1e-3))
shift = trt.Weights(beta-mean / np.sqrt(var + 1e-3) * gamma)
power = trt.Weights(np.ones(len(var), dtype=np.float32))
#添加BN层
bn = network.add_scale(layer.get_output(0), trt.ScaleMode.CHANNEL,shift, scale, power)
进一步,实际上卷积层和BN层在推理过程中是可以融合在一起的,简单来讲,卷积层的过程为:
z = w ∗ x + b z = w * x + b z=w∗x+b
这里的
z
z
z替换掉BN
公式的
x
x
x就可以得到:
y
=
w
∗
γ
V
a
r
[
x
]
+
ϵ
∗
x
+
b
−
E
[
x
]
V
a
r
[
x
]
+
ϵ
∗
γ
+
β
y= \frac{w*\gamma}{\sqrt{Var[x]+\epsilon}}*x+\frac{b-E[x]}{\sqrt{Var[x]+\epsilon}}*\gamma+\beta
y=Var[x]+ϵ
w∗γ∗x+Var[x]+ϵ
b−E[x]∗γ+β
当然这里也是矩阵操作。
w
∗
γ
V
a
r
[
x
]
+
ϵ
\frac{w*\gamma}{\sqrt{Var[x]+\epsilon}}
Var[x]+ϵ
w∗γ就是新的
w
w
w,
b
−
E
[
x
]
V
a
r
[
x
]
+
ϵ
∗
γ
+
β
\frac{b-E[x]}{\sqrt{Var[x]+\epsilon}}*\gamma+\beta
Var[x]+ϵ
b−E[x]∗γ+β就是新的
b
b
b了。
代码如下:
weights = torch.load(your_model_dict_state_path)
conv_w = weights['conv.weight'].numpy()
conv_b = weights['conv.bias'].numpy()
bn_gamma = weights['bn.weight'].numpy()
bn_beta = weights['bn.bias'].numpy()
bn_mean = weights['bn.running_mean'].numpy()
bn_var = weights['bn.running_var'].numpy()
eps = 1e-05
bn_var = np.sqrt(bn_var + eps)
fused_conv_w = conv_w * (bn_gamma / bn_var).reshape([conv_w.shape[0], 1, 1, 1])
fused_conv_b = (conv_b - bn_mean) / bn_var * bn_gamma + bn_beta
fused_conv = network.add_convolution(input=last_layer.get_output(0), num_output_maps=your_conv_out, kernel_shape=(your_conv_kernel, your_conv_kernel), kernel=fused_conv_w, bias=fused_conv_b)
fused_conv.padding = (your_conv_pad, your_conv_pad)
fused_conv.stride = (your_conv_stride, your_conv_stride)
其中,conv
是需要融合的卷积层,fused_conv
是与bn
融合后的卷积层,你需要规定fused_conv
与conv
拥有相同的参数(padding, stride, kernel_shape, num_output_maps)
。
同样地,参考BatchNorm2d的实现方法,这里需要添加一个IShuffleLayer将1D的tensor转成2D,再在2D进行BN,最后转回1D,这里你需要规定输入tensor的大小,因为TRT在shuffle
的时候需要知道该参数。大概的实现代码如下所示:
weights = torch.load(your_model_dict_state_path)
bn_gamma = weights['bn.weight'].numpy()
bn_beta = weights['bn.bias'].numpy()
bn_mean = weights['bn.running_mean'].numpy()
bn_var = weights['bn.running_var'].numpy()
eps = 1e-05
bn_var = np.sqrt(bn_var + eps)
bn_scale = bn_gamma / bn_var
bn_shift = - bn_mean / bn_var * bn_gamma + bn_beta
# reshape to 2D
shuffle = network.add_shuffle(last_layer.get_output(0))
shuffle.reshape_dims = (your_input_shape, your_input_shape, 1)
# do bn1d
bn = network.add_scale(input=shuffle.get_output(0), mode=trt.ScaleMode.CHANNEL, shift=bn_shift, scale=bn_scale)
# reshape to 1D
shuffle = network.add_shuffle(bn.get_output(0))
shuffle.reshape_dims = (your_input_shape, your_input_shape, 1)
参考PyTorch的hswish的实现:
class hswish(nn.Module):
def forward(self, x):
out = x * F.relu6(x + 3, inplace=True) / 6
return out
参考relu6
的公式:
R
e
L
U
6
(
x
)
=
m
i
n
(
m
a
x
(
0
,
x
)
,
6
)
ReLU6(x)=min(max(0,x),6)
ReLU6(x)=min(max(0,x),6)
我们可以得到如下TRT的实现代码:
# x + 3
shape = (1, ) * len(your_input_shape)
tensor = 3.0 * torch.ones(shape, dtype=trt.float32).cpu().numpy()
trt_3 = network.add_constant(shape, tensor)
tmp = network.add_elementwise(last_layer.get_output(0), trt_3.get_output(0), trt.ElementWiseOperation.SUM)
# relu6(x + 3)
relu = network.add_activation(input=tmp.get_output(0), type=trt.ActivationType.RELU)
shape = (1, ) * len(your_input_shape)
tensor = 6.0 * torch.ones(shape, dtype=trt.float32).cpu().numpy()
trt_6 = network.add_constant(shape, tensor)
relu_6 = network.add_elementwise(relu.get_output(0), trt_6.get_output(0), trt.ElementWiseOperation.MIN)
# x * relu6(x + 3)
tmp = network.add_elementwise(last_layer.get_output(0), tmp.get_output(0), trt.ElementWiseOperation.PROD)
# x * relu6(x + 3) / 6
out = network.add_elementwise(tmp.get_output(0), trt_6.get_output(0), trt.ElementWiseOperation.DIV)
当将模型转换为onnx模型时,会出现以下报错:
RuntimeError: Exporting the operator silu to ONNX opset version 11 is not supported. Please open a bug to request ONNX export support for the missing operator.
原因是onnx1.8.0还未支持silu
算子,可以修改torch源码。源码位置:{your_python_path}/lib/python3.7/site-packages/torch/nn/modules/activation.py
:
源代码:
class SiLU(Module):
__constants__ = ['inplace']
inplace: bool
def __init__(self, inplace: bool = False):
super(SiLU, self).__init__()
self.inplace = inplace
def forward(self, input: Tensor) -> Tensor:
return F.silu(input, inplace=self.inplace)
把F.silu
替换掉,修改为:
class SiLU(Module):
__constants__ = ['inplace']
inplace: bool
def __init__(self, inplace: bool = False):
super(SiLU, self).__init__()
self.inplace = inplace
def forward(self, input: Tensor) -> Tensor:
# return F.silu(input, inplace=self.inplace)
return input * torch.sigmoid(input)
激活函数Silu
:
f
(
x
)
=
x
⋅
σ
(
x
)
f
′
(
x
)
=
f
(
x
)
+
σ
(
x
)
(
1
−
f
(
x
)
)
f(x)=x⋅\sigma(x)\\[5pt] f^{'}(x)=f(x)+\sigma(x)(1−f(x))
f(x)=x⋅σ(x)f′(x)=f(x)+σ(x)(1−f(x))
从上面公式可以看出来其实就是给sigmoid
激活函数加了一个权重,这个权重恰恰就是输入。
同样,TensorRT中也没有直接提供Silu
的api,通过add_activation配合add_elementwise中的乘操作可以轻松构建Silu
。
sig = network.add_activation(bn1.get_output(0), trt.ActivationType.SIGMOID)
silu = network.add_elementwise(bn1.get_output(0), sig.get_output(0), trt.ElementWiseOperation.PROD)
参考3. BN与track_running_stats_麦克斯韦恶魔CSDN
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。