赞
踩
显卡A100
安装cuda11.3.1
wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run
sudo sh cuda_11.3.1_465.19.01_linux.run
sudo dpkg -i libcudnn8
sudo dpkg -i libcudnn8
sudo dpkg -i libcudnn8
conda create -n pcdet python==3.8
安装torch
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
安装spconv
确保没有cumm和spconv安装
pip list | grep spconv
pip list | grep cumm
pip install spconv-cu113 -i https://pypi.mirrors.ustc.edu.cn/simple/
安装pcdet
python setup.py develop -i https://pypi.mirrors.ustc.edu.cn/simple/
安装cv2
pip install opencv_python -i https://github.com/Haiyang-W/DSVT.git
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
vim /home/suwei/suwei_ws/OpenPCDet/pcdet/datasets/__init__.py
跑出来结果只有76+
仔细看了看voxelnext文档,发现在waymo下运行的还有IOU_branch,maxbool等操作,
其中IOU_branch只能在waymo下才能设置,maxbool还得安装作者开发的spconv-plus才能运行。
修改yaml文件,主要是把waymo改成kitti数据集
接下来安装spconv-plus,
首先新建了一个新环境,我是克隆的刚才的的环境,卸载pccm,ccimport,cumm,spconv-cu113
然后下载源代码spconv-plus
git clone https://github.com/dvlab-research/spconv-plus.git
然后安装pccm0.3.4,ccimport0.3.7,cumm==0.2.8
pip install pccm==0.3.4
pip install ccimport ==0.3.7
pip install cumm==0.2.8
然后进入spconv-plus目录
python setup.py bdist_wheel
然后cd dist/
pip install xxx.whl
至此安装spconv完成
但是老是报错
spconv 没有SparseModule
然后我就找到pcdet/utils/spconv_utils.py,中有这么一句
import spconv
if float(spconv.__version__[2:]) >= 2.2:
spconv.constants.SPCONV_USE_DIRECT_TABLE = False
print(1)
try:
import spconv.pytorch as spconv
print(2)
except:
import spconv as spconv
print(3)
运行程序,输出2,3,说明无法 import spconv.pytorch as spconv,这句是spconv2.x之后的import spconv的语句
而我安装的版本是2.1.21,spconv-plus
所以进入python命令行输入import spconv.pytorch as spconv,报错
import spconv.core_cc as _ext
ImportError: arg(): could not convert default argument 'timer: tv::CUDAKernelTimer' in method '<class 'spconv.core_cc.cumm.gemm.main.GemmParams'>.init' into a Python object (type not registered yet?)
这个看起来是cuda,pytorch版本不匹配,但是不想重装了
换个方法解决。
直接用原来的环境来跑,报错
self.max_pool_list = [spconv.SparseMaxPool2d(k, 1, 1, subm=True, algo=ConvAlgo.Native, indice_key='max_pool_head%d'%i) for i, k in enumerate(kernel_size_list)]
这一句说没有subm这个参数
通过观察官方spconv中maxpool的代码,就在spconv/pytorch/pool.py,中SparseMaxPool2d这个类,确实没有subm这个参数,但是它继承的SparseMaxPool有subm
class SparseMaxPool(SparseModule):
def __init__(self,
ndim,
kernel_size: Union[int, List[int], Tuple[int, ...]] = 3,
stride: Optional[Union[int, List[int], Tuple[int, ...]]] = 1,
padding: Union[int, List[int], Tuple[int, ...]] = 0,
dilation: Union[int, List[int], Tuple[int, ...]] = 1,
indice_key: Optional[str] = None,
subm: bool = False, ##这里有subm
algo: Optional[ConvAlgo] = None,
record_voxel_count: bool = False,
name=None):
而spconv-plus中的SparseMaxPool2d这个类却有subm这个参数,所以我想直接修改官方的spconv代码,如下
直接找到环境中envs/xxx/lib/python3.8/site-packages/spconv/pytorch/pool.py
修改代码,直接在SparseMaxPool2d类添加参数subm=False
class SparseMaxPool2d(SparseMaxPool): def __init__(self, kernel_size, stride=None, padding=0, dilation=1, indice_key=None, subm=False, algo: Optional[ConvAlgo] = None, record_voxel_count: bool = False, name=None): super(SparseMaxPool2d, self).__init__(2, kernel_size, stride, padding, dilation, indice_key=indice_key, subm=False, algo=algo, record_voxel_count=record_voxel_count, name=name)
然后运行train.py ,报错
spatial_indices = self.forward_ret_dict[‘voxel_indices’][:, 1:]
keyerror这一句没有voxel_indices这个key
但是当头是voxelnext_head.py就没有这个问题,查找代码,对比,然后在forward方法里加入了
self.forward_ret_dict['voxel_indices'] = voxel_indices
然后正常运行了,运行结果明天再说:
运行test的时候出现bug
File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 526, in forward_test
x_hm_max = max_pool(x_hm, True)
File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given
原因还是spconv中的代码不一样,多了一个参数SparseMaxPool2d继承的SparseMaxPool的forward多了一个参数
def forward(self, input, return_inverse=False):
我直接在voxelnext_head_maxpool.py中将526行改成
x_hm_max = max_piool(x_hm)
又来bug了
Traceback (most recent call last): File "train.py", line 229, in <module> main() File "train.py", line 219, in main repeat_eval_ckpt( File "/home/suwei/suwei_ws/max_voxelnext/tools/test.py", line 123, in repeat_eval_ckpt tb_dict = eval_utils.eval_one_epoch( File "/home/suwei/suwei_ws/max_voxelnext/tools/eval_utils/eval_utils.py", line 65, in eval_one_epoch pred_dicts, ret_dict = model(batch_dict) File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "../pcdet/models/detectors/voxelnext.py", line 13, in forward batch_dict = cur_module(batch_dict) File "/home/suwei/anaconda3/envs/max_voxelnext/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 610, in forward data_dict = forward_test(x, data_dict) File "../pcdet/models/dense_heads/voxelnext_head_maxpool.py", line 528, in forward_test selected = (x_hm_max.features == x_hm.features).squeeze(-1) RuntimeError: The size of tensor a (199419) must match the size of tensor b (712) at non-singleton dimension 0
还是不行啊
This is a bug of gcc (or pybind) that the dependency cumm is built in gcc 10 and spconv is built in gcc 9. I will change cumm build env to gcc 9, please update your cumm to v0.1.10 by pip install -U cumm-cu111 after an hour.
尝试改变gcc
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。