赞
踩
一般的梯度下降的方法是 θ j = θ j − α ∂ ∂ θ j J \theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J θj=θj−α∂θj∂J
给出pytorch.optim.Adam类中具体实现的算法,来自pytorch中class Adam(Optimizer)
input : γ (lr) , β 1 , β 2 (betas) , θ 0 (params) , f ( θ ) (objective) λ (weight decay) , a m s g r a d initialize : m 0 ← 0 ( first moment) , v 0 ← 0 (second moment) , v 0 ^ m a x ← 0 for t = 1 to … do g t ← ∇ θ f t ( θ t − 1 ) if λ ≠ 0 g t ← g t + λ θ t − 1 m t ← β 1 m t − 1 + ( 1 − β 1 ) g t v t ← β 2 v t − 1 + ( 1 − β 2 ) g t 2 m t ^ ← m t / ( 1 − β 1 t ) v t ^ ← v t / ( 1 − β 2 t ) if a m s g r a d v t ^ m a x ← m a x ( v t ^ m a x , v t ^ ) θ t ← θ t − 1 − γ m t ^ / ( v t ^ m a x + ϵ ) else θ t ← θ t − 1 − γ m t ^ / ( v t ^ + ϵ ) r e t u r n θ t input:γ (lr),β1,β2 (betas),θ0 (params),f(θ) (objective)λ (weight decay),amsgradinitialize:m0←0 ( first moment),v0←0 (second moment),^v0max←0fort=1to…dogt←∇θft(θt−1)ifλ≠0gt←gt+λθt−1mt←β1mt−1+(1−β1)gtvt←β2vt−1+(1−β2)g2t^mt←mt/(1−βt1)^vt←vt/(1−βt2)ifamsgrad^vtmax←max(^vtmax,^vt)θt←θt−1−γ^mt/(√^vtmax+ϵ)elseθt←θt−1−γ^mt/(√^vt+ϵ)returnθt input:γ (lr),β1,β2 (betas),θ0 (params),f(θ) (objective)λ (weight decay),amsgradinitialize:m0←0 ( first moment),v0←0 (second moment),v0
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。