赞
踩
优化器中 最重要的一个参数是学习率,合理的学习率可以使优化器快速收敛。一般在训练初期设定较大的学习率,随着训练的进行,学习率逐渐减小,学习率什么时候减小,减小多少,这就涉及到 学习率调整方法。
pytorch V1.60 提供了 10种 learning rate 调整方法,这里做一个简单的总结。
所有的学习率调整方法可以分3大类,分别是 有序调整,自适应调整,自定义调整。
第一类:有序调整,依据一定的规律有序进行调整,这一类是最常用的,分别是等间隔下降(step), 按需设定下降间隔(MultiStep),指数下降(Exponential)和余弦退火CosineAnnealing。这种方法的调整时间都是人为可控的,也是训练时常用到的。
第二类:自适应调整,依据训练状态伺机调整,ReduceLROnPlateau方法。该方法通过监测某一指标的变化情况,当该指标不再怎么变化的时候,就是调整学习率的时机,因而属于自适应调整。
第三类:自定义调整,Lambda. lamda方法提供的调整策略十分灵活。我们可以为不同的层设置不同的学习率调整方法,这在fine-tune中十分有用,我们不仅可以为不同层设定不同的学习率,还是设置不同的学习率调整策略。
torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
为不同参数组设定不同学习率调整策略。调整规则为 lr = base_lr * lambda(self.last_epoch)
参数:
>>> # Assuming optimizer has two groups.
>>> ignored_params = list(map(id, net.fc3.parameters()))
>>> base_params = filter(lambda p: id(p) not in ignored_params, net.parameters())
>>> optimizer = optim.SGD([{'params':base_params},{'params':net.fc3.parameters(), 'lr': 0.001*100}], 0.001, momentum=0.9, weight_decay=1e-4)
>>> lambda1 = lambda epoch: epoch // 30
>>> lambda2 = lambda epoch: 0.95 ** epoch
>>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
>>> print('epoch: ', i, 'lr: ', scheduler.get_lr())
输出:
epoch: 0 lr: [0.0, 0.1]
epoch: 1 lr: [0.0, 0.095]
epoch: 2 lr: [0.0, 0.09025]
epoch: 3 lr: [0.001, 0.0857375]
epoch: 4 lr: [0.001, 0.081450625]
epoch: 5 lr: [0.001, 0.07737809374999999]
epoch: 6 lr: [0.002, 0.07350918906249998]
epoch: 7 lr: [0.002, 0.06983372960937498]
epoch: 8 lr: [0.002, 0.06634204312890622]
epoch: 9 lr: [0.003, 0.0630249409724609]
为什么第一个参数组的学习率会是0呢?来看看学习率是如何计算的。
第一个参数的学习率设置为0.001, lambda = lambda epoch: epoch/3
第一个epoch时,由lr = base_lr * lambda(self.last_epoch) 可以知道 lr = 0.001 * (0//3) = 0
第二个参数组的学习率变化,初始为0.1, lr = 0.1 * 0.95^epoch,当epoch 为0时,lr=0.1, epoch为1时,lr=0.1*0.95.
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
等间隔调整学习率,调整倍数为gamma倍。间隔单位是step. 需要注意的是step通常是指epoch,而不是iteration。
参数:
>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 60
>>> # lr = 0.0005 if 60 <= epoch < 90
>>> # ...
>>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
按照设定的间隔调整学习率。这个方法适合后期调试使用,观察loss曲线,为每个实验定制学习率调整时机。
参数:
>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 80
>>> # lr = 0.0005 if epoch >= 80
>>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)
按指数衰减调整学习率。lr = lr * (gamma **epoch)
参数:
torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
余弦退火方式调整学习率。以余弦为周期,并在每个周期最大值时重新设置学习率。
学习率调整公式:
η
t
=
η
m
i
n
+
1
2
(
η
m
a
x
−
η
m
i
n
)
(
1
+
c
o
s
(
T
c
u
r
T
m
a
x
π
)
)
\eta_t = \eta_{min} + \frac {1}{2} (\eta_{max} - \eta_{min})(1 + cos(\frac {T_{cur}}{T_{max}}\pi))
ηt=ηmin+21(ηmax−ηmin)(1+cos(TmaxTcurπ))
可以看出,余弦退火调整方式,是以初始学习率为最大学习率,以2*Tmax为周期,在一个周期内先下降后上升地调整学习率。
参数:
torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
这个是自适应学习率调整。非常实用的调整策略。
当某个指标不再变化(loss不再下降或者acc不再升高),调整学习率。
参数:
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = ReduceLROnPlateau(optimizer, 'min')
>>> for epoch in range(10):
>>> train(...)
>>> val_loss = validate(...)
>>> # Note that step should be called after validate()
>>> scheduler.step(val_loss)
torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1)
这个是来自于 Cyclical Learning rates for training Neural Networks 这篇文章。
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.01, max_lr=0.1)
>>> data_loader = torch.utils.data.DataLoader(...)
>>> for epoch in range(10):
>>> for batch in data_loader:
>>> train_batch(...)
>>> scheduler.step()
注意:pytorch中, 学习率更新时 scheduler.step(),新版的pytorch 已经不依赖于epoch了 ,如果你在epoch的for循环中调用 step(),与之前的方法一样 epoch 也会加一。 如果你在epoch 里面的 iteration 的for循环中调用 lr_schedular.step() 那么会调用get_closed_form_lr() 或者就直接get_lr() 这里epoch 参数已经不是必须的了。
在源码中 torch/optim/lr_scheduler.py , step()的方法在LRScheduler类当中,该类作为所有学习率调整的基类,其中定义了一些基本方法,如step(),以及最常用的get_lr(),不过get_lr是一个虚函数,均需要在派生类中重新定义。
具体可以参考:lr_scheduler
我们看一下 step()函数。
class _LRScheduler(object): def __init__(self, optimizer, last_epoch=-1): # Attach optimizer if not isinstance(optimizer, Optimizer): raise TypeError('{} is not an Optimizer'.format( type(optimizer).__name__)) self.optimizer = optimizer # Initialize epoch and base learning rates if last_epoch == -1: for group in optimizer.param_groups: group.setdefault('initial_lr', group['lr']) else: for i, group in enumerate(optimizer.param_groups): if 'initial_lr' not in group: raise KeyError("param 'initial_lr' is not specified " "in param_groups[{}] when resuming an optimizer".format(i)) self.base_lrs = list(map(lambda group: group['initial_lr'], optimizer.param_groups)) self.last_epoch = last_epoch # Following https://github.com/pytorch/pytorch/issues/20124 # We would like to ensure that `lr_scheduler.step()` is called after # `optimizer.step()` def with_counter(method): if getattr(method, '_with_counter', False): # `optimizer.step()` has already been replaced, return. return method # Keep a weak reference to the optimizer instance to prevent # cyclic references. instance_ref = weakref.ref(method.__self__) # Get the unbound method for the same purpose. func = method.__func__ cls = instance_ref().__class__ del method @wraps(func) def wrapper(*args, **kwargs): instance = instance_ref() instance._step_count += 1 wrapped = func.__get__(instance, cls) return wrapped(*args, **kwargs) # Note that the returned function here is no longer a bound method, # so attributes like `__func__` and `__self__` no longer exist. wrapper._with_counter = True return wrapper self.optimizer.step = with_counter(self.optimizer.step) self.optimizer._step_count = 0 self._step_count = 0 self.step() def state_dict(self): """Returns the state of the scheduler as a :class:`dict`. It contains an entry for every variable in self.__dict__ which is not the optimizer. """ return {key: value for key, value in self.__dict__.items() if key != 'optimizer'} def load_state_dict(self, state_dict): """Loads the schedulers state. Arguments: state_dict (dict): scheduler state. Should be an object returned from a call to :meth:`state_dict`. """ self.__dict__.update(state_dict) def get_last_lr(self): """ Return last computed learning rate by current scheduler. """ return self._last_lr def get_lr(self): # Compute learning rate using chainable form of the scheduler raise NotImplementedError def step(self, epoch=None): # Raise a warning if old pattern is detected # https://github.com/pytorch/pytorch/issues/20124 if self._step_count == 1: if not hasattr(self.optimizer.step, "_with_counter"): warnings.warn("Seems like `optimizer.step()` has been overridden after learning rate scheduler " "initialization. Please, make sure to call `optimizer.step()` before " "`lr_scheduler.step()`. See more details at " "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) # Just check if there were two first lr_scheduler.step() calls before optimizer.step() elif self.optimizer._step_count < 1: warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " "In PyTorch 1.1.0 and later, you should call them in the opposite order: " "`optimizer.step()` before `lr_scheduler.step()`. Failure to do this " "will result in PyTorch skipping the first value of the learning rate schedule. " "See more details at " "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) self._step_count += 1 class _enable_get_lr_call: def __init__(self, o): self.o = o def __enter__(self): self.o._get_lr_called_within_step = True return self def __exit__(self, type, value, traceback): self.o._get_lr_called_within_step = False with _enable_get_lr_call(self): if epoch is None: self.last_epoch += 1 values = self.get_lr() else: warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning) self.last_epoch = epoch if hasattr(self, "_get_closed_form_lr"): values = self._get_closed_form_lr() else: values = self.get_lr() for param_group, lr in zip(self.optimizer.param_groups, values): param_group['lr'] = lr self._last_lr = [group['lr'] for group in self.optimizer.param_groups]
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。