赞
踩
由于项目代码比较复杂且可读性差…,尝试使用Hugging Face的Accelerate实现多卡的分布式训练。
Accelerate主要解决的问题是分布式训练(distributed training),在项目的开始阶段,可能要在单个GPU上跑起来,但是为了加速训练,考虑多卡训练。当然,如果想要debug代码,推荐在CPU上运行调试,因为会产生更meaningful的错误。
使用Accelerate的优势:
首先安装Accelerate ,通过pip或者conda
pip install accelerate
或者
conda install -c conda-forge accelerate
在要训练的机器上配置训练信息,输入
accelerate config
根据提示,完成配置。其他配置方法,比如直接写yaml文件等,参考官方教程。
查看配置信息:
accelerate env
https://huggingface.co/docs/accelerate/basic_tutorials/migration
device = "cuda"
model.to(device)
for batch in training_dataloader:
optimizer.zero_grad()
inputs, targets = batch
inputs = inputs.to(device)
targets = targets.to(device)
outputs = model(inputs)
loss = loss_function(outputs, targets)
loss.backward()
optimizer.step()
scheduler.step()
如何添加Accelerate到代码中呢?
from accelerate import Accelerator accelerator = Accelerator() # 首先创建实例 # 训练相关的传入prepare() model, optimizer, training_dataloader, scheduler = accelerator.prepare( model, optimizer, training_dataloader, scheduler ) # device = "cuda" # model.to(device) for batch in training_dataloader: optimizer.zero_grad() inputs, targets = batch # inputs = inputs.to(device) # targets = targets.to(device) outputs = model(inputs) loss = loss_function(outputs, targets) # loss.backward() accelerator.backward(loss) optimizer.step() scheduler.step()
这样就修改完了,还是挺简单的。
注:如果需要到device,此时device不再是cuda,而是
# device = 'cuda'
device = accelerator.device
https://huggingface.co/docs/accelerate/v0.17.1/en/basic_tutorials/launch
首先,将上面的代码重写到一个函数中,并将其作为脚本进行调用,如:
from accelerate import Accelerator + def main(): accelerator = Accelerator() model, optimizer, training_dataloader, scheduler = accelerator.prepare( model, optimizer, training_dataloader, scheduler ) for batch in training_dataloader: optimizer.zero_grad() inputs, targets = batch outputs = model(inputs) loss = loss_function(outputs, targets) accelerator.backward(loss) optimizer.step() scheduler.step() + if __name__ == "__main__": + main()
前面已经配置过了,这步可以省略,但是如果想要换一个训练配置,比如2个卡换到3个卡,就需要重新配置一下
accelerate config
accelerate launch {script_name.py} {--arg1} {--arg2} ...
这里只是用了最简单的命令,如果使用自己定义的配置文件启动等一些复杂的命令,参考官方教程
https://huggingface.co/docs/accelerate/main/en/usage_guides/tracking
https://docs.wandb.ai/guides/integrations/accelerate
看了半天HuggingFace教程没看明白怎么添加其他wandb run的参数(我还是太菜了!),最后在wandb的教程中找到了… 传入init_kwargs参数
示例:
from accelerate import Accelerator # Tell the Accelerator object to log with wandb accelerator = Accelerator(log_with="wandb") # Initialise your wandb run, passing wandb parameters and any config information accelerator.init_trackers( project_name="my_project", config={"dropout": 0.1, "learning_rate": 1e-2} init_kwargs={"wandb": {"entity": "my-wandb-team"}} ) ... # Log to wandb by calling `accelerator.log`, `step` is optional accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=global_step) # Make sure that the wandb tracker finishes correctly accelerator.end_training()
最后,完整的代码如下:
from accelerate import Accelerator def main(): accelerator = Accelerator(log_with="wandb") # 首先创建实例 accelerator.init_trackers( project_name="my_project", config={"dropout": 0.1, "learning_rate": 1e-2} init_kwargs={"wandb": {"entity": "my-wandb-team"}} ) # 训练相关的传入prepare() model, optimizer, training_dataloader, scheduler = accelerator.prepare( model, optimizer, training_dataloader, scheduler ) # device = "cuda" # model.to(device) step = 0 for batch in training_dataloader: optimizer.zero_grad() inputs, targets = batch # inputs = inputs.to(device) # targets = targets.to(device) outputs = model(inputs) loss = loss_function(outputs, targets) accelerator.log({"train_loss": loss}, step=step) # loss.backward() accelerator.backward(loss) optimizer.step() scheduler.step() step += 1 if __name__ == "__main__": main()
https://huggingface.co/docs/accelerate/v0.17.1/en/index
https://docs.wandb.ai/guides/integrations/accelerate
Hugging Face Accelerate Super Charged With Weights & Biases
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。