赞
踩
纯新手入门对服务器和深度学习都不太了解,主要参考以下连链接,如有错误欢迎指出。
(64条消息) 云服务器复现PointRCNN代码踩坑总结_Matt今年18岁的博客-CSDN博客_pointrcnn 知乎
(64条消息) pointRCNN原理与复现_啦咔咔儿的博客-CSDN博客_pointrcnn复现
代码地址| 论文地址
官网地址:http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d
下载image_2,velodyne,calib,以及label几个文件
将准备好的数据集放入object文件夹中,并将其通过软链接连接到PointRcnn文件的KITTI/object下
Linux命令--删除软连接 - KoMiles - 博客园 (cnblogs.com)
- PointRCNN
- ├── data
- │ ├── KITTI
- │ │ ├── ImageSets
- │ │ ├── object
- │ │ │ ├──training
- │ │ │ ├──calib & velodyne & label_2 & image_2 & (optional: planes)
- │ │ │ ├──testing
- │ │ │ ├──calib & velodyne & image_2
- ├── lib
- ├── pointnet2_lib
- ├── tools
- conda create -n pointRCNN python=3.6
- conda activate pointRCNN
此处报错
- CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
- To initialize your shell, run
-
- $ conda init <SHELL_NAME>
解决,参考
- source activate
- (base) root@autodl-container-cbc41190ac-1699b82c:~# source deactivate1
- bash: deactivate1: No such file or directory
- (base) root@autodl-container-cbc41190ac-1699b82c:~# conda deactivate
- root@autodl-container-cbc41190ac-1699b82c:~# conda activate pointRCNN
git clone --recursive https://github.com/sshaoshuai/PointRCNN.git
- conda install pytorch==1.0.0 torchvision==0.2.1 cuda100 -c pytorch
-
- pip install easydict
- pip install tqdm
- pip install tensorboardX
- pip install fire
- pip install numba
- pip install pyyaml
- pip install scikit-image
- pip install shapely
遇到问题
(1)Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
解决:增加清华源
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
- conda config --set show_channel_urls yes
conda update --all
上述方法都不行,最后发现是开机的时候空闲GPU不够,选的无卡开机。。。有GPU后就好了
(2)pip版本太低也可能导致部分软件包安装失败,更新pip
pip install --upgrade pip
sh build_and_install.sh
报错(其他警告均忽略了)
ImportError: /root/miniconda3/envs/pointRCNN/lib/python3.6/site-packages/torch/lib/libmkldnn.so.0: undefined symbol: cblas_sgemm_alloc
安装MLK,参考
conda install mkl=2018 -c anaconda
(1) 下载预训练模型,并放在tools文件夹下,下载链接
(2)快速演示
python eval_rcnn.py --cfg_file cfgs/default.yaml --ckpt PointRCNN.pth --batch_size 1 --eval_mode rcnn --set RPN.LOC_XZ_FINE False
报错:
- Traceback (most recent call last):
- File "eval_rcnn.py", line 865, in <module>
- cfg_from_file(args.cfg_file)
- File "/root/PointRCNN/tools/../lib/config.py", line 187, in cfg_from_file
- yaml_cfg = edict(yaml.load(f))
- TypeError: load() missing 1 required positional argument: 'Loader'
解决:将yaml.load(f)改为yaml.safe_load(f),参考
重新运行,报错
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THC/THCBlas.cu:441
查看Issue,目前没找到解决办法,可能是租的服务器显卡是3090,和cuda版本不匹配
参考下述链接,重新创建了一个虚拟环境安装pytorch1.8,并修改代码
Ubuntu20.04+RTX3090复现PointRCNN记录_ubuntu复现pointrcnn_BadgerL的博客-CSDN博客
复现pointrcnn+ubuntu16.043080显卡+pytorch1.7.1+cu110_少年NG的博客-CSDN博客
再次执行sh build_and_install.sh并重新测试评估,成功
(1)自己训练模型,主要参照作者readme文件
- python generate_gt_database.py --class_name 'Car' --split train
- (这里发现使用软链接还是会报错,暂时没找到解决办法,目前是把数据集和代码放在一起)
-
- ## 一阶段
- python train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 16 --train_mode rpn --epochs 200
-
- ## 二阶段
- python train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 4 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth
训练完成后结果会保存在PointRCNN/output/rcnn/default/ckpt中。
(2)生成可视化文件,此处参照其他博客进行了修改
- python eval_rcnn.py --cfg_file cfgs/default.yaml --ckpt ../output/rcnn/default/ckpt/checkpoint_epoch_70.pth --batch_size 2 --eval_mode rcnn
-
- ## evaluate all the checkpoints
- python eval_rcnn.py --cfg_file cfgs/default.yaml --eval_mode rcnn --eval_all
因为我自己电脑装的双系统,所以这里选择将生成的预测结果下载下来,在本地进行可视化
(1)数据处理
将KITTI数据集按如下结构放置,并创建软链接
kitti object testing calib 000000.txt image_2 000000.png label_2 000000.txt velodyne 000000.bin pred 000000.txt training calib 000000.txt image_2 000000.png label_2 000000.txt velodyne 000000.bin pred 000000.txt ## 软链接 cd kitti_object_vis/data ln -s ../PointRCNN/data/KITTI/object object(注意删除下载好文件中本来的Object文件)
(2)环境配置
- ## 创建并激活虚拟环境
- conda create -n kitti_vis python=3.7 # vtk does not support python 3.8
- conda activate kitti_vis
-
- ## 安装软件包
- pip install opencv-python pillow scipy matplotlib
- conda install mayavi -c conda-forge
-
- ## 测试代码
- python kitti_object.py --show_lidar_with_depth --img_fov --const_box --vis
报错:
- QObject::moveToThread: Current thread (0x180c7f0) is not the object's thread (0x1e3c090).
- Cannot move to target thread (0x180c7f0)
-
- qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/chen/anaconda3/envs/kitti_vis/lib/python3.7/site-packages/cv2/qt/plugins" even though it was found.
- This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
-
- Available platform plugins are: xcb, eglfs, minimal, minimalegl, offscreen, vnc, webgl.
-
- 已放弃 (核心已转储)
解决参考记录opencv的 QObject::moveToThread:
将kitti_vis中cv2包里面的qt改成qt.bak
(3)显示预测结果,参考云服务器复现PointRCNN代码踩坑总结
再次出现于与上述相同错误,最后参考此链接解决。最终配置
- pip install opencv-python==4.1.2.30
- pip install pillow scipy matplotlib
- pip install vtk==8.1.2
- pip install mayavi==4.7.4
- pip install PyQt5==5.15.6
重新运行测试代码,报错。
- libGL error: MESA-LOADER: failed to open iris: /usr/lib/dri/iris_dri.so: 无法打开共享对象文件: 没有那个文件或目录 (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
- libGL error: failed to load driver: iris
- libGL error: MESA-LOADER: failed to open iris: /usr/lib/dri/iris_dri.so: 无法打开共享对象文件: 没有那个文件或目录 (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
- libGL error: failed to load driver: iris
- libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: 无法打开共享对象文件: 没有那个文件或目录 (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
- libGL error: failed to load driver: swrast
- ERROR: In /work/standalone-x64-build/VTK-source/Rendering/OpenGL2/vtkXOpenGLRenderWindow.cxx, line 606
- vtkXOpenGLRenderWindow (0x3f6b640): Cannot create GLX context. Aborting.
搜索发现可能是anaconda的问题,参照此链接进行尝试,无效。
另一种解决办法,建立一个 /usr/lib/dri/iris_dri.so 的软连接,重新运行脚本报错
- libGL error: MESA-LOADER: failed to open iris: /lib/x86_64-linux-gnu/libLLVM-12.so.1: undefined symbol: ffi_type_sint32, version LIBFFI_BASE_7.0 (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
- libGL error: failed to load driver: iris
- libGL error: MESA-LOADER: failed to open iris: /lib/x86_64-linux-gnu/libLLVM-12.so.1: undefined symbol: ffi_type_sint32, version LIBFFI_BASE_7.0 (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
- libGL error: failed to load driver: iris
- libGL error: MESA-LOADER: failed to open swrast: /lib/x86_64-linux-gnu/libLLVM-12.so.1: undefined symbol: ffi_type_sint32, version LIBFFI_BASE_7.0 (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
- libGL error: failed to load driver: swrast
- no pred file
最终解决方案https://zhuanlan.zhihu.com/p/531801732。
(1)首先安装tensorflow,参考链接(安装完后才发现服务器中预装有tensorboard。。。就当记录一下吧)
选择合适版本,通过conda安装
conda install tensorflow==2.5.0
安装ipykernel
conda install ipykernel
将新建的环境环境写入notebook的kernel中
python -m ipykernel install --user --name env_tensorflow --display-name “env_tensorflow”
最后在启动页打开笔记本并测试,出现一些警告被忽略了
(2)启动tensorboard
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。