小丑西瓜9

这个屌丝很懒，什么也没留下！

热门标签

安装voxelnext环境跑kitti_voxelnext环境配置

作者：小丑西瓜9 | 2024-04-25 01:57:53

踩

voxelnext环境配置

显卡A100

安装cuda11.3.1

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run
sudo sh cuda_11.3.1_465.19.01_linux.run
1
2

安装cudnn

sudo dpkg -i libcudnn8           
sudo dpkg -i libcudnn8
sudo dpkg -i libcudnn8
1
2
3

创建虚拟环境

conda create -n pcdet python==3.8
1

安装torch

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
1

安装spconv

确保没有cumm和spconv安装
pip list | grep spconv
pip list | grep cumm

pip install spconv-cu113 -i https://pypi.mirrors.ustc.edu.cn/simple/

1
2
3
4
5
6

安装pcdet

python setup.py develop -i https://pypi.mirrors.ustc.edu.cn/simple/
1

安装cv2

pip install opencv_python -i https://github.com/Haiyang-W/DSVT.git
1

python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
1

vim /home/suwei/suwei_ws/OpenPCDet/pcdet/datasets/__init__.py
1

跑出来结果只有76+

跑voxelnext_ioubranch-maxpool.yaml

仔细看了看voxelnext文档，发现在waymo下运行的还有IOU_branch,maxbool等操作，
其中IOU_branch只能在waymo下才能设置，maxbool还得安装作者开发的spconv-plus才能运行。
修改yaml文件，主要是把waymo改成kitti数据集
接下来安装spconv-plus，
首先新建了一个新环境，我是克隆的刚才的的环境，卸载pccm，ccimport，cumm，spconv-cu113
然后下载源代码spconv-plus

git clone https://github.com/dvlab-research/spconv-plus.git
1

然后安装pccm0.3.4，ccimport0.3.7，cumm==0.2.8

pip install pccm==0.3.4 
1

pip install ccimport ==0.3.7
1

pip install cumm==0.2.8
1

然后进入spconv-plus目录

python setup.py bdist_wheel
1

然后cd dist/

pip install xxx.whl
1

至此安装spconv完成

但是老是报错

spconv 没有SparseModule
1

然后我就找到pcdet/utils/spconv_utils.py,中有这么一句

import spconv
if float(spconv.__version__[2:]) >= 2.2:
    spconv.constants.SPCONV_USE_DIRECT_TABLE = False
	print(1)
try:
    import spconv.pytorch as spconv
    print(2)
except:
    import spconv as spconv
    print(3)
1
2
3
4
5
6
7
8
9
10

运行程序，输出2，3，说明无法 import spconv.pytorch as spconv，这句是spconv2.x之后的import spconv的语句
而我安装的版本是2.1.21，spconv-plus
所以进入python命令行输入import spconv.pytorch as spconv，报错

import spconv.core_cc as _ext
ImportError: arg(): could not convert default argument 'timer: tv::CUDAKernelTimer' in method '<class 'spconv.core_cc.cumm.gemm.main.GemmParams'>.init' into a Python object (type not registered yet?)
1
2

这个看起来是cuda，pytorch版本不匹配，但是不想重装了
换个方法解决。
直接用原来的环境来跑，报错

  self.max_pool_list = [spconv.SparseMaxPool2d(k, 1, 1, subm=True, algo=ConvAlgo.Native, indice_key='max_pool_head%d'%i) for i, k in enumerate(kernel_size_list)]
  这一句说没有subm这个参数
1
2

通过观察官方spconv中maxpool的代码，就在spconv/pytorch/pool.py，中SparseMaxPool2d这个类，确实没有subm这个参数，但是它继承的SparseMaxPool有subm

class SparseMaxPool(SparseModule):
    def __init__(self,
                 ndim,
                 kernel_size: Union[int, List[int], Tuple[int, ...]] = 3,
                 stride: Optional[Union[int, List[int], Tuple[int, ...]]] = 1,
                 padding: Union[int, List[int], Tuple[int, ...]] = 0,
                 dilation: Union[int, List[int], Tuple[int, ...]] = 1,
                 indice_key: Optional[str] = None,
                 subm: bool = False, ##这里有subm
                 algo: Optional[ConvAlgo] = None,
                 record_voxel_count: bool = False,
                 name=None):
1
2
3
4
5
6
7
8
9
10
11
12

而spconv-plus中的SparseMaxPool2d这个类却有subm这个参数，所以我想直接修改官方的spconv代码，如下
直接找到环境中envs/xxx/lib/python3.8/site-packages/spconv/pytorch/pool.py
修改代码，直接在SparseMaxPool2d类添加参数subm=False

class SparseMaxPool2d(SparseMaxPool):
    def __init__(self,
                 kernel_size,
                 stride=None,
                 padding=0,
                 dilation=1,
                 indice_key=None,
                 subm=False,
                 algo: Optional[ConvAlgo] = None,
                 record_voxel_count: bool = False,
                 name=None):
        super(SparseMaxPool2d,
              self).__init__(2,
                             kernel_size,
                             stride,
                             padding,
                             dilation,
                             indice_key=indice_key,
                             subm=False,
                             algo=algo,
                             record_voxel_count=record_voxel_count,
                             name=name)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

然后运行train.py ，报错
spatial_indices = self.forward_ret_dict[‘voxel_indices’][:, 1:]
keyerror这一句没有voxel_indices这个key
但是当头是voxelnext_head.py就没有这个问题，查找代码，对比，然后在forward方法里加入了

self.forward_ret_dict['voxel_indices'] = voxel_indices
1

然后正常运行了，运行结果明天再说：

运行test的时候出现bug

  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 526, in forward_test
    x_hm_max = max_pool(x_hm, True)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given
1
2
3
4
5

原因还是spconv中的代码不一样，多了一个参数SparseMaxPool2d继承的SparseMaxPool的forward多了一个参数

def forward(self, input, return_inverse=False):
1

我直接在voxelnext_head_maxpool.py中将526行改成

x_hm_max = max_piool(x_hm)
1

又来bug了

Traceback (most recent call last):
  File "train.py", line 229, in <module>
    main()
  File "train.py", line 219, in main
    repeat_eval_ckpt(
  File "/home/suwei/suwei_ws/max_voxelnext/tools/test.py", line 123, in repeat_eval_ckpt
    tb_dict = eval_utils.eval_one_epoch(
  File "/home/suwei/suwei_ws/max_voxelnext/tools/eval_utils/eval_utils.py", line 65, in eval_one_epoch
    pred_dicts, ret_dict = model(batch_dict)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/detectors/voxelnext.py", line 13, in forward
    batch_dict = cur_module(batch_dict)
  File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 610, in forward
    data_dict = forward_test(x, data_dict)
  File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 528, in forward_test
    selected = (x_hm_max.features == x_hm.features).squeeze(-1)
RuntimeError: The size of tensor a (199419) must match the size of tensor b (712) at non-singleton dimension 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

还是不行啊
This is a bug of gcc (or pybind) that the dependency cumm is built in gcc 10 and spconv is built in gcc 9. I will change cumm build env to gcc 9, please update your cumm to v0.1.10 by pip install -U cumm-cu111 after an hour.
尝试改变gcc

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/小丑西瓜9/article/detail/482706