当前位置:   article > 正文

安装voxelnext环境跑kitti_voxelnext环境配置

voxelnext环境配置

显卡A100

安装cuda11.3.1

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run
sudo sh cuda_11.3.1_465.19.01_linux.run
  • 1
  • 2

安装cudnn

sudo dpkg -i libcudnn8           
sudo dpkg -i libcudnn8
sudo dpkg -i libcudnn8
  • 1
  • 2
  • 3

创建虚拟环境

conda create -n pcdet python==3.8
  • 1

安装torch

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
  • 1

安装spconv

确保没有cumm和spconv安装
pip list | grep spconv
pip list | grep cumm

pip install spconv-cu113 -i https://pypi.mirrors.ustc.edu.cn/simple/

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

安装pcdet

python setup.py develop -i https://pypi.mirrors.ustc.edu.cn/simple/
  • 1

安装cv2

pip install opencv_python -i https://github.com/Haiyang-W/DSVT.git
  • 1
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
  • 1
vim /home/suwei/suwei_ws/OpenPCDet/pcdet/datasets/__init__.py
  • 1

跑出来结果只有76+

跑voxelnext_ioubranch-maxpool.yaml

仔细看了看voxelnext文档,发现在waymo下运行的还有IOU_branch,maxbool等操作,
其中IOU_branch只能在waymo下才能设置,maxbool还得安装作者开发的spconv-plus才能运行。
修改yaml文件,主要是把waymo改成kitti数据集
接下来安装spconv-plus,
首先新建了一个新环境,我是克隆的刚才的的环境,卸载pccm,ccimport,cumm,spconv-cu113
然后下载源代码spconv-plus

git clone https://github.com/dvlab-research/spconv-plus.git
  • 1

然后安装pccm0.3.4,ccimport0.3.7,cumm==0.2.8

pip install pccm==0.3.4 
  • 1
pip install ccimport ==0.3.7
  • 1
pip install cumm==0.2.8
  • 1

然后进入spconv-plus目录

python setup.py bdist_wheel
  • 1

然后cd dist/

pip install xxx.whl
  • 1

至此安装spconv完成

但是老是报错

spconv 没有SparseModule
  • 1

然后我就找到pcdet/utils/spconv_utils.py,中有这么一句

import spconv
if float(spconv.__version__[2:]) >= 2.2:
    spconv.constants.SPCONV_USE_DIRECT_TABLE = False
	print(1)
try:
    import spconv.pytorch as spconv
    print(2)
except:
    import spconv as spconv
    print(3)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

运行程序,输出2,3,说明无法 import spconv.pytorch as spconv,这句是spconv2.x之后的import spconv的语句
而我安装的版本是2.1.21,spconv-plus
所以进入python命令行输入import spconv.pytorch as spconv,报错

import spconv.core_cc as _ext
ImportError: arg(): could not convert default argument 'timer: tv::CUDAKernelTimer' in method '<class 'spconv.core_cc.cumm.gemm.main.GemmParams'>.init' into a Python object (type not registered yet?)
  • 1
  • 2

这个看起来是cuda,pytorch版本不匹配,但是不想重装了
换个方法解决。
直接用原来的环境来跑,报错

  self.max_pool_list = [spconv.SparseMaxPool2d(k, 1, 1, subm=True, algo=ConvAlgo.Native, indice_key='max_pool_head%d'%i) for i, k in enumerate(kernel_size_list)]
  这一句说没有subm这个参数
  • 1
  • 2

通过观察官方spconv中maxpool的代码,就在spconv/pytorch/pool.py,中SparseMaxPool2d这个类,确实没有subm这个参数,但是它继承的SparseMaxPool有subm

class SparseMaxPool(SparseModule):
    def __init__(self,
                 ndim,
                 kernel_size: Union[int, List[int], Tuple[int, ...]] = 3,
                 stride: Optional[Union[int, List[int], Tuple[int, ...]]] = 1,
                 padding: Union[int, List[int], Tuple[int, ...]] = 0,
                 dilation: Union[int, List[int], Tuple[int, ...]] = 1,
                 indice_key: Optional[str] = None,
                 subm: bool = False, ##这里有subm
                 algo: Optional[ConvAlgo] = None,
                 record_voxel_count: bool = False,
                 name=None):
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

而spconv-plus中的SparseMaxPool2d这个类却有subm这个参数,所以我想直接修改官方的spconv代码,如下
直接找到环境中envs/xxx/lib/python3.8/site-packages/spconv/pytorch/pool.py
修改代码,直接在SparseMaxPool2d类添加参数subm=False

class SparseMaxPool2d(SparseMaxPool):
    def __init__(self,
                 kernel_size,
                 stride=None,
                 padding=0,
                 dilation=1,
                 indice_key=None,
                 subm=False,
                 algo: Optional[ConvAlgo] = None,
                 record_voxel_count: bool = False,
                 name=None):
        super(SparseMaxPool2d,
              self).__init__(2,
                             kernel_size,
                             stride,
                             padding,
                             dilation,
                             indice_key=indice_key,
                             subm=False,
                             algo=algo,
                             record_voxel_count=record_voxel_count,
                             name=name)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

然后运行train.py ,报错
spatial_indices = self.forward_ret_dict[‘voxel_indices’][:, 1:]
keyerror这一句没有voxel_indices这个key
但是当头是voxelnext_head.py就没有这个问题,查找代码,对比,然后在forward方法里加入了

self.forward_ret_dict['voxel_indices'] = voxel_indices
  • 1

然后正常运行了,运行结果明天再说:

运行test的时候出现bug

  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 526, in forward_test
    x_hm_max = max_pool(x_hm, True)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given
  • 1
  • 2
  • 3
  • 4
  • 5

原因还是spconv中的代码不一样,多了一个参数SparseMaxPool2d继承的SparseMaxPool的forward多了一个参数

def forward(self, input, return_inverse=False):
  • 1

我直接在voxelnext_head_maxpool.py中将526行改成

x_hm_max = max_piool(x_hm)
  • 1

又来bug了

Traceback (most recent call last):
  File "train.py", line 229, in <module>
    main()
  File "train.py", line 219, in main
    repeat_eval_ckpt(
  File "/home/suwei/suwei_ws/max_voxelnext/tools/test.py", line 123, in repeat_eval_ckpt
    tb_dict = eval_utils.eval_one_epoch(
  File "/home/suwei/suwei_ws/max_voxelnext/tools/eval_utils/eval_utils.py", line 65, in eval_one_epoch
    pred_dicts, ret_dict = model(batch_dict)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/detectors/voxelnext.py", line 13, in forward
    batch_dict = cur_module(batch_dict)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 610, in forward
    data_dict = forward_test(x, data_dict)
  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 528, in forward_test
    selected = (x_hm_max.features == x_hm.features).squeeze(-1)
RuntimeError: The size of tensor a (199419) must match the size of tensor b (712) at non-singleton dimension 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

还是不行啊
This is a bug of gcc (or pybind) that the dependency cumm is built in gcc 10 and spconv is built in gcc 9. I will change cumm build env to gcc 9, please update your cumm to v0.1.10 by pip install -U cumm-cu111 after an hour.
尝试改变gcc

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/482706
推荐阅读
相关标签
  

闽ICP备14008679号