当前位置:   article > 正文

ubuntu18.04 下slowfast网络环境安装及模型测试( python3.9)_importerror: cannot import name 'cat_all_gather' f

importerror: cannot import name 'cat_all_gather' from 'pytorchvideo.layers.d

用pip 安装建议用国内源,如 pip install xxx -i https://pypi.tuna.tsinghua.edu.cn/simple

目录

1.conda env 环境创建

2. install pytorch 

3. install fvcore

4. install simplejson

5. gcc版本查看

6. PyAV

7.ffmpeg with PyAV

8. PyYaml , tqdm

9.iopath

10. psutil

11. opencv

12. tensorboard

13. moviepy

14. PyTorchVideo

15. Detectron2

16. FairScale

17. SlowFast

运行Demo测试模型

安装过程中遇到的一些errors

error0 

         error1

error2

error3

error4

error5

error6

error7


1.conda env 环境创建

conda create -n py39 python=3.9

2. install pytorch 

先查看cuda版本 , 再对应pytorch版本

查看系统nvidia驱动版本支持最高cuda版本

查看当前cuda版本

根据对应cuda版本安装pytorch torchvision

source activate py39
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

3. install fvcore

pip install git+https://github.com/facebookresearch/fvcore

4. install simplejson

pip install simplejson 

5. gcc版本查看

gcc -v



版本是 7.5.0

6. PyAV

conda install av -c conda-forge

7.ffmpeg with PyAV

pip install av

8. PyYaml , tqdm

pip list fvcore

9.iopath

pip install -U iopath

10. psutil

pip install psutil

11. opencv

pip install opencv-python

12. tensorboard

查看是否安装tensorboard:

conda list tensorboard


没有安装tensorboard

pip install tensorboard

13. moviepy

pip install moviepy

14. PyTorchVideo

pip install pytorchvideo

15. Detectron2

git clone https://github.com/facebookresearch/detectron2 detectron2_repo

pip install -e detectron2_repo

16. FairScale

pip install git+https://github.com/facebookresearch/fairscale

17. SlowFast

git clone https://github.com/facebookresearch/SlowFast.git


cd SlowFast
python setup.py build develop

运行Demo测试模型

python3 tools/run_net.py --cfg demo/AVA/SLOWFAST_32x2_R101_50_50.yaml

安装过程中遇到的一些errors

error0 

not find PIL 

解决办法:将setup.py 中的 PIL 更改为 Pillow 

error1

from pytorchvideo.layers.distributed import ( # noqa
ImportError: cannot import name 'cat_all_gather' from 'pytorchvideo.layers.distributed' (/home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/pytorchvideo/layers/distributed.py)

解决方式:

方式一:将pytorchvideo/pytorchvideo at main · facebookresearch/pytorchvideo · GitHub文件下内容复制到虚拟环境所对应的文件下,这里是:/home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/pytorchvideo/

方式二:
layers/distributed.py添加如下内容

  1. # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2. """Distributed helpers."""
  3. import torch
  4. import torch.distributed as dist
  5. from torch._C._distributed_c10d import ProcessGroup
  6. from torch.autograd.function import Function
  7. _LOCAL_PROCESS_GROUP = None
  8. def get_world_size() -> int:
  9. """
  10. Simple wrapper for correctly getting worldsize in both distributed
  11. / non-distributed settings
  12. """
  13. return (
  14. torch.distributed.get_world_size()
  15. if torch.distributed.is_available() and torch.distributed.is_initialized()
  16. else 1
  17. )
  18. def cat_all_gather(tensors, local=False):
  19. """Performs the concatenated all_reduce operation on the provided tensors."""
  20. if local:
  21. gather_sz = get_local_size()
  22. else:
  23. gather_sz = torch.distributed.get_world_size()
  24. tensors_gather = [torch.ones_like(tensors) for _ in range(gather_sz)]
  25. torch.distributed.all_gather(
  26. tensors_gather,
  27. tensors,
  28. async_op=False,
  29. group=_LOCAL_PROCESS_GROUP if local else None,
  30. )
  31. output = torch.cat(tensors_gather, dim=0)
  32. return output
  33. def init_distributed_training(cfg):
  34. """
  35. Initialize variables needed for distributed training.
  36. """
  37. if cfg.NUM_GPUS <= 1:
  38. return
  39. num_gpus_per_machine = cfg.NUM_GPUS
  40. num_machines = dist.get_world_size() // num_gpus_per_machine
  41. for i in range(num_machines):
  42. ranks_on_i = list(
  43. range(i * num_gpus_per_machine, (i + 1) * num_gpus_per_machine)
  44. )
  45. pg = dist.new_group(ranks_on_i)
  46. if i == cfg.SHARD_ID:
  47. global _LOCAL_PROCESS_GROUP
  48. _LOCAL_PROCESS_GROUP = pg
  49. def get_local_size() -> int:
  50. """
  51. Returns:
  52. The size of the per-machine process group,
  53. i.e. the number of processes per machine.
  54. """
  55. if not dist.is_available():
  56. return 1
  57. if not dist.is_initialized():
  58. return 1
  59. return dist.get_world_size(group=_LOCAL_PROCESS_GROUP)
  60. def get_local_rank() -> int:
  61. """
  62. Returns:
  63. The rank of the current process within the local (per-machine) process group.
  64. """
  65. if not dist.is_available():
  66. return 0
  67. if not dist.is_initialized():
  68. return 0
  69. assert _LOCAL_PROCESS_GROUP is not None
  70. return dist.get_rank(group=_LOCAL_PROCESS_GROUP)
  71. def get_local_process_group() -> ProcessGroup:
  72. assert _LOCAL_PROCESS_GROUP is not None
  73. return _LOCAL_PROCESS_GROUP
  74. class GroupGather(Function):
  75. """
  76. GroupGather performs all gather on each of the local process/ GPU groups.
  77. """
  78. @staticmethod
  79. def forward(ctx, input, num_sync_devices, num_groups):
  80. """
  81. Perform forwarding, gathering the stats across different process/ GPU
  82. group.
  83. """
  84. ctx.num_sync_devices = num_sync_devices
  85. ctx.num_groups = num_groups
  86. input_list = [torch.zeros_like(input) for k in range(get_local_size())]
  87. dist.all_gather(
  88. input_list, input, async_op=False, group=get_local_process_group()
  89. )
  90. inputs = torch.stack(input_list, dim=0)
  91. if num_groups > 1:
  92. rank = get_local_rank()
  93. group_idx = rank // num_sync_devices
  94. inputs = inputs[
  95. group_idx * num_sync_devices : (group_idx + 1) * num_sync_devices
  96. ]
  97. inputs = torch.sum(inputs, dim=0)
  98. return inputs
  99. @staticmethod
  100. def backward(ctx, grad_output):
  101. """
  102. Perform backwarding, gathering the gradients across different process/ GPU
  103. group.
  104. """
  105. grad_output_list = [
  106. torch.zeros_like(grad_output) for k in range(get_local_size())
  107. ]
  108. dist.all_gather(
  109. grad_output_list,
  110. grad_output,
  111. async_op=False,
  112. group=get_local_process_group(),
  113. )
  114. grads = torch.stack(grad_output_list, dim=0)
  115. if ctx.num_groups > 1:
  116. rank = get_local_rank()
  117. group_idx = rank // ctx.num_sync_devices
  118. grads = grads[
  119. group_idx
  120. * ctx.num_sync_devices : (group_idx + 1)
  121. * ctx.num_sync_devices
  122. ]
  123. grads = torch.sum(grads, dim=0)
  124. return grads, None, None

error2

from scipy.ndimage import gaussian_filter

ModuleNotFoundError: No module named 'scipy'

解决方法:

pip install scipy

error3

from av._core import time_base, library_versions

ImportError: /home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/av/../../.././libgnutls.so.30: symbol mpn_copyi version HOGWEED_6 not defined in file libhogweed.so.6 with link time reference
 

解决方法:

先移处av包

使用 pip安装


pip install av


error4

File "/media/cxgk/Linux/work/SlowFast/slowfast/models/losses.py", line 11, in
from pytorchvideo.losses.soft_target_cross_entropy import (
ModuleNotFoundError: No module named 'pytorchvideo.losses'

解决办法:

打开"/home/cxgk/anaconda3/envs/sf/lib/python3.9/site-packages/pytorchvideo/losses",在文件夹下新建 soft_target_cross_entropy.py, 并打开添加如下代码:

  1. # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2. import torch
  3. import torch.nn as nn
  4. import torch.nn.functional as F
  5. from pytorchvideo.layers.utils import set_attributes
  6. from pytorchvideo.transforms.functional import convert_to_one_hot
  7. class SoftTargetCrossEntropyLoss(nn.Module):
  8. """
  9. Adapted from Classy Vision: ./classy_vision/losses/soft_target_cross_entropy_loss.py.
  10. This allows the targets for the cross entropy loss to be multi-label.
  11. """
  12. def __init__(
  13. self,
  14. ignore_index: int = -100,
  15. reduction: str = "mean",
  16. normalize_targets: bool = True,
  17. ) -> None:
  18. """
  19. Args:
  20. ignore_index (int): sample should be ignored for loss if the class is this value.
  21. reduction (str): specifies reduction to apply to the output.
  22. normalize_targets (bool): whether the targets should be normalized to a sum of 1
  23. based on the total count of positive targets for a given sample.
  24. """
  25. super().__init__()
  26. set_attributes(self, locals())
  27. assert isinstance(self.normalize_targets, bool)
  28. if self.reduction not in ["mean", "none"]:
  29. raise NotImplementedError(
  30. 'reduction type "{}" not implemented'.format(self.reduction)
  31. )
  32. self.eps = torch.finfo(torch.float32).eps
  33. def forward(self, input: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
  34. """
  35. Args:
  36. input (torch.Tensor): the shape of the tensor is N x C, where N is the number of
  37. samples and C is the number of classes. The tensor is raw input without
  38. softmax/sigmoid.
  39. target (torch.Tensor): the shape of the tensor is N x C or N. If the shape is N, we
  40. will convert the target to one hot vectors.
  41. """
  42. # Check if targets are inputted as class integers
  43. if target.ndim == 1:
  44. assert (
  45. input.shape[0] == target.shape[0]
  46. ), "SoftTargetCrossEntropyLoss requires input and target to have same batch size!"
  47. target = convert_to_one_hot(target.view(-1, 1), input.shape[1])
  48. assert input.shape == target.shape, (
  49. "SoftTargetCrossEntropyLoss requires input and target to be same "
  50. f"shape: {input.shape} != {target.shape}"
  51. )
  52. # Samples where the targets are ignore_index do not contribute to the loss
  53. N, C = target.shape
  54. valid_mask = torch.ones((N, 1), dtype=torch.float).to(input.device)
  55. if 0 <= self.ignore_index <= C - 1:
  56. drop_idx = target[:, self.ignore_idx] > 0
  57. valid_mask[drop_idx] = 0
  58. valid_targets = target.float() * valid_mask
  59. if self.normalize_targets:
  60. valid_targets /= self.eps + valid_targets.sum(dim=1, keepdim=True)
  61. per_sample_per_target_loss = -valid_targets * F.log_softmax(input, -1)
  62. per_sample_loss = torch.sum(per_sample_per_target_loss, -1)
  63. # Perform reduction
  64. if self.reduction == "mean":
  65. # Normalize based on the number of samples with > 0 non-ignored targets
  66. loss = per_sample_loss.sum() / torch.sum(
  67. (torch.sum(valid_mask, -1) > 0)
  68. ).clamp(min=1)
  69. elif self.reduction == "none":
  70. loss = per_sample_loss
  71. return

error5

from sklearn.metrics import confusion_matrix

ModuleNotFoundError: No module named 'sklearn'

解决办法:

pip install scikit-learn

error6

raise KeyError("Non-existent config key: {}".format(full_key))

KeyError: 'Non-existent config key: TENSORBOARD.MODEL_VIS.TOPK'

解决方法:

注释掉如下三行:

TENSORBOARD

MODEL_VIS

TOPK

error7

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.94 GiB total capacity; 2.83 GiB already allocated; 25.44 MiB free; 2.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

解决方法:

将yaml里的帧数改小:

DATA:
NUM_FRAMES: 16

Reference:

https://github.com/facebookresearch/pytorchvideo/blob/main/pytorchvideo

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/420483
推荐阅读
相关标签
  

闽ICP备14008679号