赞
踩
非线性激活函数的主要作用
线性函数不能用作激活函数的原因
sigmoid(logistic)、tanh
RELU、Leaky RELU、PRELU、RRELU
ELU、SELU
Sigmoid 的优缺点
(eg:logistic regression)
梯度包含了
f
′
(
z
l
)
{f}'(z^{l})
f′(zl)和上一层的误差项(又包含了
f
′
(
z
l
+
1
)
{f}'(z^{l+1})
f′(zl+1):z 为权重加权和) 两个乘法因子,具体反向传播推导见此链接指数倍
增加,而其值域为(0, 0.25],所以梯度越往后传递
值越小,最终导致权重无法正常更新Tanh 的优缺点
RELU 的优缺点
sparse representations
with true zeros( more likely to be linearly separable)权重和(z)
大于 0 的时候为 1,从而误差可以很好的传播,权重可以正常更新权重和(z)
小于 0 的时候为 0,会导致梯度值为0,从而权重无法正常更新RELU 的变种
ELU
SELU(出自 Self-Normalizing Neural Networks )
def selu(x):
with ops.name_scope('elu') as scope:
alpha = 1.6732632423543772848170429916717
scale = 1.0507009873554804934193349852946
return scale*tf.where(x>=0.0, x, alpha*tf.nn.elu(x))
Use ReLU with a samll learning rate
- Try out ELU / Leaky RELU / SELU
- Try out tanh but don not expect much and never use sigmoid
- Output layer should use
softmax
for classification orliner
for regression
# All activation ops apply componentwise, and produce # a tensor of the same shape as the input tensor tf.sigmoid(x, name = None) == tf.nn.sigmoid(x, name = None) # y = 1 / (1 + exp(-x)) tf.tanh(x, name = None) == tf.nn.tanh(x, name = None) # y = (exp(x) - exp(-x)) / (exp(x) + exp(-x)) tf.nn.relu(features, name=None) # y = max(features, 0) tf.nn.leaky_relu(features, alpha=0.2, name=None) tf.nn.elu(features, name=None) # exp(features) - 1 if < 0, features otherwise tf.nn.relu6(features, name=None) # y = min(max(features, 0), 6) tf.nn.crelu(features, name=None) # concat 后的结果为:[relu(features), -relu(features)],一个是relu,一个是relu关于y轴对称的形状 tf.nn.softplus(features, name=None) # y = log(exp(features) + 1) tf.nn.softsign(features, name=None) # y = features / (abs(features) + 1)
1、深度学习中的激活函数导引
2、Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei)
3、https://cs231n.github.io/neural-networks-1/
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。