当前位置:   article > 正文

Distributed training error on Nuscene Dataset

Distributed training error on Nuscene Dataset

报错信息

Traceback (most recent call last):
File “./tools/train.py”, line 261, in
main()
File “./tools/train.py”, line 250, in main
custom_train_model(
File “/hy-tmp/mmdetection3d-1.0.0rc6/OccNet/projects/mmdet3d_plugin/bevformer/apis/train.py”, line 27, in custom_train_model
custom_train_detector(
File “/hy-tmp/mmdetection3d-1.0.0rc6/OccNet/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py”, line 199, in custom_train_detector
runner.run(data_loaders, cfg.workflow)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py”, line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py”, line 49, in train
for i, data_batch in enumerate(self.data_loader):
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 442, in iter
return self._get_iterator()
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1043, in init
w.start()
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/process.py”, line 121, in start
self._popen = self._Popen(self)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/context.py”, line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/context.py”, line 284, in _Popen
return Popen(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/popen_spawn_posix.py”, line 32, in init
super().init(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/popen_fork.py”, line 19, in init
self._launch(process_obj)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/popen_spawn_posix.py”, line 47, in _launch
reduction.dump(process_obj, fp)
File “/usr/local/miniconda3/envs/occ/lib/python3.8/multiprocessing/reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle ‘dict_keys’ object

缺少了dict_keys 就是DDP识别不了 多卡,通过github上面的分享

将这个代码torch.multiprocessing.set_start_method('fork') 加到train.py 里面就ok了

if __name__ == '__main__':
    torch.multiprocessing.set_start_method('fork')
    main()
  • 1
  • 2
  • 3

torch.multiprocessing.set_start_method('fork')语句用于multiprocessing在PyTorch中使用该模块时指定子进程的启动方法。 startfork方法是基于 Unix 的系统的默认方法,通常被认为是生成子进程的最有效的启动方法。

当使用forkstart方法时,父进程在内存中创建自己的新副本(fork),子进程从与父进程相同的内存空间开始执行。这意味着子进程可以访问与父进程相同的所有变量和数据结构,这在某些情况下可以提高性能。

然而,启动方法也有一些限制fork。例如,forkstart方法不能在Windows系统上使用,如果子进程尝试使用某些非线程安全的库,它也会导致问题。

一般来说,fork启动方法对于大多数用例来说是一个不错的选择,但重要的是要意识到它的局限性。如果您不确定使用哪种启动方法,您始终可以使用默认的spawn启动方法,这种方法更便携,但效率较低。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/415112
推荐阅读
相关标签
  

闽ICP备14008679号