当前位置:   article > 正文

torch之distributed training(分布式训练)_torch.distributed.init_process_group(backend='nccl

torch.distributed.init_process_group(backend='nccl', init_method='env://')
  • 对于分布式训练的理解
  1. 分布式训练是多线程多节点训练,但是若是模型过大,导致单一图像已经超出单个GPU显存大小,则和单机多卡中的数据并行效果一致,都会导致out of memary问题;
  2. 模型并行其运行速率太慢,相当于是每次只能在单一卡上运行,会出现显卡空闲的情况,解析见下:
    pytorch单机并行训练
  1. parsers中添加相关参数解析;
  2. training之前,设置节点等级:
torch.cuda.set_device(args.local_rank)
  • 1
  1. 初始化:
torch.distributed.init_process_group(backend='nccl',init_method='env://')
  • 1
  1. 分布式加载:
self.model = self.model.cuda()
 self.model = torch.nn.parallel.DistributedDataParallel(model, device_ids=self.args.gpu_ids)
  • 1
  • 2

5.运行

CUDA_VISIBLE_DEVICES=1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=4 train_autodeeplab.py --backbone resnet --lr 0.007 --workers 4  --epochs 50 --batch-size 2 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset pascal --arch-lr 0.1 --distributed True  --local_rank 0
  • 1
import argparse
import os
import random
import shutil
import time
import warnings

import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models

model_names = sorted(name for name in models.__dict__
    if name.islower() and not name.startswith("__")
    and callable(models.__dict__[name]))

parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
parser.add_argument('data', metavar='DIR',
                    help='path to dataset')
parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet18',
                    choices=model_names,
                    help='model architecture: ' +
                        ' | '.join(model_names) +
                        ' (default: resnet18)')
parser.add_argument('-j', '--workers', default=4, type=int, metavar='N',
                    help='number of data loading workers (default: 4)')
parser.add_argument('--epochs', default=90, type=int, metavar='N',
                    help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
                    help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=256, type=int,
                    metavar='N',
                    help='mini-batch size (default: 256), this is the total '
                         'batch size of all GPUs on the current node when '
                         'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.1, type=float,
                    metavar='LR', help='initial learning rate', dest='lr')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
                    help='momentum')
parser.add_argument('--wd', '--weight-decay', 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/420468
推荐阅读
相关标签
  

闽ICP备14008679号