当前位置:   article > 正文

HuggingFace Accelerate解决分布式训练_accelerate多卡训练

accelerate多卡训练

由于项目代码比较复杂且可读性差…,尝试使用Hugging Face的Accelerate实现多卡的分布式训练

1/ 为什么使用HuggingFace Accelerate

Accelerate主要解决的问题是分布式训练(distributed training),在项目的开始阶段,可能要在单个GPU上跑起来,但是为了加速训练,考虑多卡训练。当然,如果想要debug代码,推荐在CPU上运行调试,因为会产生更meaningful的错误

使用Accelerate的优势:

  • 可以适配CPU/GPU/TPU,也就是说,使用Accelerate后,只需要改动config(见后面安装和配置)就能实现同一套代码在不同的配置下训练
  • 非常方便的实现分布式评估(distributed evaluation)
  • 实现mixed precision和gradient accumulation更简单
  • 增强分布式系统中的日志记录和跟踪
  • 保存分布式系统的训练状态更为简单
  • 完全分片并行数据训练
  • 集成DeepSpeed
  • 整合了各种实验tracker,比如wandb、tensorboard等
  • CLI命令启动训练代码
  • 方便在Jupyter Notebook启动分布式训练

2/ 安装和配置

首先安装Accelerate ,通过pip或者conda

pip install accelerate
  • 1

或者

conda install -c conda-forge accelerate
  • 1

在要训练的机器上配置训练信息,输入

accelerate config
  • 1

根据提示,完成配置。其他配置方法,比如直接写yaml文件等,参考官方教程

查看配置信息:

accelerate env
  • 1

3/ 使用Accelerate

https://huggingface.co/docs/accelerate/basic_tutorials/migration

3.1/ 基本的pytorch训练过程

device = "cuda"
model.to(device)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    inputs = inputs.to(device)
    targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    loss.backward()
    optimizer.step()
    scheduler.step()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

如何添加Accelerate到代码中呢?

3.2/ 添加Accelerate

from accelerate import Accelerator

accelerator = Accelerator() # 首先创建实例

# 训练相关的传入prepare()
model, optimizer, training_dataloader, scheduler = accelerator.prepare(
    model, optimizer, training_dataloader, scheduler
)

# device = "cuda"
# model.to(device)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    # inputs = inputs.to(device)
    # targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    # loss.backward()
    accelerator.backward(loss)
    
    optimizer.step()
    scheduler.step()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

这样就修改完了,还是挺简单的。

:如果需要到device,此时device不再是cuda,而是

# device = 'cuda'
device = accelerator.device
  • 1
  • 2

4/ 启动训练

https://huggingface.co/docs/accelerate/v0.17.1/en/basic_tutorials/launch

首先,将上面的代码重写到一个函数中,并将其作为脚本进行调用,如:

  from accelerate import Accelerator
  
+ def main():
      accelerator = Accelerator()

      model, optimizer, training_dataloader, scheduler = accelerator.prepare(
          model, optimizer, training_dataloader, scheduler
      )

      for batch in training_dataloader:
          optimizer.zero_grad()
          inputs, targets = batch
          outputs = model(inputs)
          loss = loss_function(outputs, targets)
          accelerator.backward(loss)
          optimizer.step()
          scheduler.step()

+ if __name__ == "__main__":
+     main()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

4.1/ 环境配置

前面已经配置过了,这步可以省略,但是如果想要换一个训练配置,比如2个卡换到3个卡,就需要重新配置一下

accelerate config
  • 1

4.2/ 启动

accelerate launch {script_name.py} {--arg1} {--arg2} ...
  • 1

这里只是用了最简单的命令,如果使用自己定义的配置文件启动等一些复杂的命令,参考官方教程

5/ 配合wandb记录实验

https://huggingface.co/docs/accelerate/main/en/usage_guides/tracking
https://docs.wandb.ai/guides/integrations/accelerate

看了半天HuggingFace教程没看明白怎么添加其他wandb run的参数(我还是太菜了!),最后在wandb的教程中找到了… 传入init_kwargs参数

示例:

from accelerate import Accelerator

# Tell the Accelerator object to log with wandb
accelerator = Accelerator(log_with="wandb")

# Initialise your wandb run, passing wandb parameters and any config information
accelerator.init_trackers(
    project_name="my_project", 
    config={"dropout": 0.1, "learning_rate": 1e-2}
    init_kwargs={"wandb": {"entity": "my-wandb-team"}}
    )

...

# Log to wandb by calling `accelerator.log`, `step` is optional
accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=global_step)


# Make sure that the wandb tracker finishes correctly
accelerator.end_training()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

6/ 全部代码

最后,完整的代码如下:

from accelerate import Accelerator

def main():
	accelerator = Accelerator(log_with="wandb") # 首先创建实例
	
	accelerator.init_trackers(
	    project_name="my_project", 
	    config={"dropout": 0.1, "learning_rate": 1e-2}
	    init_kwargs={"wandb": {"entity": "my-wandb-team"}}
	    )
	
	# 训练相关的传入prepare()
	model, optimizer, training_dataloader, scheduler = accelerator.prepare(
	    model, optimizer, training_dataloader, scheduler
	)
	
	# device = "cuda"
	# model.to(device)
	
	step = 0
	for batch in training_dataloader:
	    optimizer.zero_grad()
	    inputs, targets = batch
	    # inputs = inputs.to(device)
	    # targets = targets.to(device)
	    outputs = model(inputs)
	    loss = loss_function(outputs, targets)
	    
	    accelerator.log({"train_loss": loss}, step=step)
	    
	    # loss.backward()
	    accelerator.backward(loss)
	    
	    optimizer.step()
	    scheduler.step()
		
		step += 1
	    
if __name__ == "__main__":
     main()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40

参考

https://huggingface.co/docs/accelerate/v0.17.1/en/index
https://docs.wandb.ai/guides/integrations/accelerate
Hugging Face Accelerate Super Charged With Weights & Biases

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/420478
推荐阅读
相关标签
  

闽ICP备14008679号