赞
踩
PointPillars 是一个来自工业界的模型,整体的思想是基于图片的处理框架,直接将点云从俯视图的视角划分为一个个的立方柱体(Pillars),从而构成了伪图片数据,然后再使用2D检测框架进行特征提取和预测得到检测框,从而使得该模型在速度和精度都达到了一个很好的平衡。 PointPillars 的网络结构如图所示:
本文将会以 MMDetection3D 的代码为基础,详细解读 PointPillars 的每一行代码实现以及原因。这是本人的第一篇代码讲解,解读中难免会出现不足之处,欢迎各位的批评指正,如果有好的意见大家都可以在评论区留言。感谢大家!
PointPillars 的配置文件主要由 kitti-3d-3class.py
、pointpillars_hv_secfpn_kitti.py
、cyclic-40e.py
和 default_runtime.py
四个文件共同组成,下面将进行详细的介绍。
kitti-3d-3class.py
的实现代码在 mmdetection3d/config/_base_/datasets/kitti-3d-3class.py
# dataset settings dataset_type = 'KittiDataset' data_root = 'data/kitti/' class_names = ['Pedestrian', 'Cyclist', 'Car'] point_cloud_range = [0, -40, -3, 70.4, 40, 1] input_modality = dict(use_lidar=True, use_camera=False) metainfo = dict(classes=class_names) # Example to use different file client # Method 1: simply set the data root and let the file I/O module # automatically infer from prefix (not support LMDB and Memcache yet) # data_root = 's3://openmmlab/datasets/detection3d/kitti/' # Method 2: Use backend_args, file_client_args in versions before 1.1.0 # backend_args = dict( # backend='petrel', # path_mapping=dict({ # './data/': 's3://openmmlab/datasets/detection3d/', # 'data/': 's3://openmmlab/datasets/detection3d/' # })) backend_args = None db_sampler = dict( data_root=data_root, info_path=data_root + 'kitti_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)), classes=class_names, sample_groups=dict(Car=12, Pedestrian=6, Cyclist=6), points_loader=dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args), backend_args=backend_args) train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, # x, y, z, intensity use_dim=4, backend_args=backend_args), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True), dict(type='ObjectSample', db_sampler=db_sampler), dict( type='ObjectNoise', num_try=100, translation_std=[1.0, 1.0, 0.5], global_rot_range=[0.0, 0.0], rot_range=[-0.78539816, 0.78539816]), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict( type='GlobalRotScaleTrans', rot_range=[-0.78539816, 0.78539816], scale_ratio_range=[0.95, 1.05]), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='PointShuffle'), dict( type='Pack3DDetInputs', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d']) ] test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='GlobalRotScaleTrans', rot_range=[0, 0], scale_ratio_range=[1., 1.], translation_std=[0, 0, 0]), dict(type='RandomFlip3D'), dict( type='PointsRangeFilter', point_cloud_range=point_cloud_range) ]), dict(type='Pack3DDetInputs', keys=['points']) ] # construct a pipeline for data and gt loading in show function # please keep its loading function consistent with test_pipeline (e.g. client) eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args), dict(type='Pack3DDetInputs', keys=['points']) ] train_dataloader = dict( batch_size=8, num_workers=4, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type='RepeatDataset', times=2, dataset=dict( type=dataset_type, data_root=data_root, ann_file='kitti_infos_train.pkl', data_prefix=dict(pts='training/velodyne_reduced'), pipeline=train_pipeline, modality=input_modality, test_mode=False, metainfo=metainfo, # we use box_type_3d='LiDAR' in kitti and nuscenes dataset # and box_type_3d='Depth' in sunrgbd and scannet dataset. box_type_3d='LiDAR', backend_args=backend_args))) val_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, data_prefix=dict(pts='training/velodyne_reduced'), ann_file='kitti_infos_val.pkl', pipeline=test_pipeline, modality=input_modality, test_mode=True, metainfo=metainfo, box_type_3d='LiDAR', backend_args=backend_args)) test_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, data_prefix=dict(pts='training/velodyne_reduced'), ann_file='kitti_infos_val.pkl', pipeline=test_pipeline, modality=input_modality, test_mode=True, metainfo=metainfo, box_type_3d='LiDAR', backend_args=backend_args)) val_evaluator = dict( type='KittiMetric', ann_file=data_root + 'kitti_infos_val.pkl', metric='bbox', backend_args=backend_args) test_evaluator = val_evaluator vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
在这里我主要对其中重要的代码进行了解释:
pointpillars_hv_secfpn_kitti.py
的实现代码在 mmdetection3d/config/_base_/models/pointpillars_hv_secfpn_kitti.py
voxel_size = [0.16, 0.16, 4] model = dict( type='VoxelNet', data_preprocessor=dict( type='Det3DDataPreprocessor', voxel=True, voxel_layer=dict( max_num_points=32, # max_points_per_voxel point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1], voxel_size=voxel_size, max_voxels=(16000, 40000))), voxel_encoder=dict( type='PillarFeatureNet', in_channels=4, feat_channels=[64], with_distance=False, voxel_size=voxel_size, point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]), middle_encoder=dict( type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]), backbone=dict( type='SECOND', in_channels=64, layer_nums=[3, 5, 5], layer_strides=[2, 2, 2], out_channels=[64, 128, 256]), neck=dict( type='SECONDFPN', in_channels=[64, 128, 256], upsample_strides=[1, 2, 4], out_channels=[128, 128, 128]), bbox_head=dict( type='Anchor3DHead', num_classes=3, in_channels=384, feat_channels=384, use_direction_classifier=True, assign_per_class=True, anchor_generator=dict( type='AlignedAnchor3DRangeGenerator', ranges=[ [0, -39.68, -0.6, 69.12, 39.68, -0.6], [0, -39.68, -0.6, 69.12, 39.68, -0.6], [0, -39.68, -1.78, 69.12, 39.68, -1.78], ], sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]], rotations=[0, 1.57], reshape_out=False), diff_rad_by_sin=True, bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'), loss_cls=dict( type='mmdet.FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict( type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0), loss_dir=dict( type='mmdet.CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)), # model training and testing settings train_cfg=dict( assigner=[ dict( # for Pedestrian type='Max3DIoUAssigner', iou_calculator=dict(type='mmdet3d.BboxOverlapsNearest3D'), pos_iou_thr=0.5, neg_iou_thr=0.35, min_pos_iou=0.35, ignore_iof_thr=-1), dict( # for Cyclist type='Max3DIoUAssigner', iou_calculator=dict(type='mmdet3d.BboxOverlapsNearest3D'), pos_iou_thr=0.5, neg_iou_thr=0.35, min_pos_iou=0.35, ignore_iof_thr=-1), dict( # for Car type='Max3DIoUAssigner', iou_calculator=dict(type='mmdet3d.BboxOverlapsNearest3D'), pos_iou_thr=0.6, neg_iou_thr=0.45, min_pos_iou=0.45, ignore_iof_thr=-1), ], allowed_border=0, pos_weight=-1, debug=False), test_cfg=dict( use_rotate_nms=True, nms_across_levels=False, nms_thr=0.01, score_thr=0.1, min_bbox_size=0, nms_pre=100, max_num=50))
在这里我主要对其中重要的代码进行了解释:
cyclic-40e.py
的实现代码在 mmdetection3d/config/_base_/schedules/cyclic-40e.py
# The schedule is usually used by models trained on KITTI dataset # The learning rate set in the cyclic schedule is the initial learning rate # rather than the max learning rate. Since the target_ratio is (10, 1e-4), # the learning rate will change from 0.0018 to 0.018, than go to 0.0018*1e-4 lr = 0.0018 # The optimizer follows the setting in SECOND.Pytorch, but here we use # the official AdamW optimizer implemented by PyTorch. optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='AdamW', lr=lr, betas=(0.95, 0.99), weight_decay=0.01), clip_grad=dict(max_norm=10, norm_type=2)) # learning rate param_scheduler = [ # learning rate scheduler # During the first 16 epochs, learning rate increases from 0 to lr * 10 # during the next 24 epochs, learning rate decreases from lr * 10 to # lr * 1e-4 dict( type='CosineAnnealingLR', T_max=16, eta_min=lr * 10, begin=0, end=16, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=24, eta_min=lr * 1e-4, begin=16, end=40, by_epoch=True, convert_to_iter_based=True), # momentum scheduler # During the first 16 epochs, momentum increases from 0 to 0.85 / 0.95 # during the next 24 epochs, momentum increases from 0.85 / 0.95 to 1 dict( type='CosineAnnealingMomentum', T_max=16, eta_min=0.85 / 0.95, begin=0, end=16, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingMomentum', T_max=24, eta_min=1, begin=16, end=40, by_epoch=True, convert_to_iter_based=True) ] # Runtime settings,training schedule for 40e # Although the max_epochs is 40, this schedule is usually used we # RepeatDataset with repeat ratio N, thus the actual max epoch # number could be Nx40 train_cfg = dict(by_epoch=True, max_epochs=40, val_interval=1) val_cfg = dict() test_cfg = dict() # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically # or not by default. # - `base_batch_size` = (8 GPUs) x (6 samples per GPU). auto_scale_lr = dict(enable=False, base_batch_size=48)
在这里我主要对其中重要的代码进行了解释:
default_runtime.py
的实现代码在 mmdetection3d/config/_base_/default_runtime.py
default_scope = 'mmdet3d' default_hooks = dict( timer=dict(type='IterTimerHook'), logger=dict(type='LoggerHook', interval=50), param_scheduler=dict(type='ParamSchedulerHook'), checkpoint=dict(type='CheckpointHook', interval=-1), sampler_seed=dict(type='DistSamplerSeedHook'), visualization=dict(type='Det3DVisualizationHook')) env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl'), ) log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) log_level = 'INFO' load_from = None resume = False # TODO: support auto scaling lr
在这里我主要对其中重要的代码进行了解释:
parse_data_info()
函数开始解读:KittiDataset
的实现代码在 mmdetection3d/mmdet3d/datasets/dense_heads/kitti_dataset.py
# Copyright (c) OpenMMLab. All rights reserved. from typing import Callable, List, Union import numpy as np from mmdet3d.registry import DATASETS from mmdet3d.structures import CameraInstance3DBoxes from .det3d_dataset import Det3DDataset @DATASETS.register_module() class KittiDataset(Det3DDataset): r"""KITTI Dataset. This class serves as the API for experiments on the `KITTI Dataset <http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d>`_. Args: data_root (str): Path of dataset root. ann_file (str): Path of annotation file. pipeline (List[dict]): Pipeline used for data processing. Defaults to []. modality (dict): Modality to specify the sensor data used as input. Defaults to dict(use_lidar=True). default_cam_key (str): The default camera name adopted. Defaults to 'CAM2'. load_type (str): Type of loading mode. Defaults to 'frame_based'. - 'frame_based': Load all of the instances in the frame. - 'mv_image_based': Load all of the instances in the frame and need to convert to the FOV-based data type to support image-based detector. - 'fov_image_based': Only load the instances inside the default cam, and need to convert to the FOV-based data type to support image-based detector. box_type_3d (str): Type of 3D box of this dataset. Based on the `box_type_3d`, the dataset will encapsulate the box to its original format then converted them to `box_type_3d`. Defaults to 'LiDAR' in this dataset. Available options includes: - 'LiDAR': Box in LiDAR coordinates. - 'Depth': Box in depth coordinates, usually for indoor dataset. - 'Camera': Box in camera coordinates. filter_empty_gt (bool): Whether to filter the data with empty GT. If it's set to be True, the example with empty annotations after data pipeline will be dropped and a random example will be chosen in `__getitem__`. Defaults to True. test_mode (bool): Whether the dataset is in test mode. Defaults to False. pcd_limit_range (List[float]): The range of point cloud used to filter invalid predicted boxes. Defaults to [0, -40, -3, 70.4, 40, 0.0]. """ # TODO: use full classes of kitti METAINFO = { 'classes': ('Pedestrian', 'Cyclist', 'Car', 'Van', 'Truck', 'Person_sitting', 'Tram', 'Misc') } def __init__(self, data_root: str, ann_file: str, pipeline: List[Union[dict, Callable]] = [], modality: dict = dict(use_lidar=True), default_cam_key: str = 'CAM2', load_type: str = 'frame_based', box_type_3d: str = 'LiDAR', filter_empty_gt: bool = True, test_mode: bool = False, pcd_limit_range: List[float] = [0, -40, -3, 70.4, 40, 0.0], **kwargs) -> None: self.pcd_limit_range = pcd_limit_range assert load_type in ('frame_based', 'mv_image_based', 'fov_image_based') self.load_type = load_type super().__init__( data_root=data_root, ann_file=ann_file, pipeline=pipeline, modality=modality, default_cam_key=default_cam_key, box_type_3d=box_type_3d, filter_empty_gt=filter_empty_gt, test_mode=test_mode, **kwargs) assert self.modality is not None assert box_type_3d.lower() in ('lidar', 'camera') def parse_data_info(self, info: dict) -> dict: """Process the raw data info. The only difference with it in `Det3DDataset` is the specific process for `plane`. Args: info (dict): Raw info dict. Returns: dict: Has `ann_info` in training stage. And all path has been converted to absolute path. """ if self.modality['use_lidar']: if 'plane' in info: # convert ground plane to velodyne coordinates plane = np.array(info['plane']) lidar2cam = np.array( info['images']['CAM2']['lidar2cam'], dtype=np.float32) reverse = np.linalg.inv(lidar2cam) (plane_norm_cam, plane_off_cam) = (plane[:3], -plane[:3] * plane[3]) plane_norm_lidar = \ (reverse[:3, :3] @ plane_norm_cam[:, None])[:, 0] plane_off_lidar = ( reverse[:3, :3] @ plane_off_cam[:, None][:, 0] + reverse[:3, 3]) plane_lidar = np.zeros_like(plane_norm_lidar, shape=(4, )) plane_lidar[:3] = plane_norm_lidar plane_lidar[3] = -plane_norm_lidar.T @ plane_off_lidar else: plane_lidar = None info['plane'] = plane_lidar if self.load_type == 'fov_image_based' and self.load_eval_anns: info['instances'] = info['cam_instances'][self.default_cam_key] info = super().parse_data_info(info) return info def parse_ann_info(self, info: dict) -> dict: """Process the `instances` in data info to `ann_info`. Args: info (dict): Data information of single data sample. Returns: dict: Annotation information consists of the following keys: - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes`): 3D ground truth bboxes. - bbox_labels_3d (np.ndarray): Labels of ground truths. - gt_bboxes (np.ndarray): 2D ground truth bboxes. - gt_labels (np.ndarray): Labels of ground truths. - difficulty (int): Difficulty defined by KITTI. 0, 1, 2 represent xxxxx respectively. """ ann_info = super().parse_ann_info(info) if ann_info is None: ann_info = dict() # empty instance ann_info['gt_bboxes_3d'] = np.zeros((0, 7), dtype=np.float32) ann_info['gt_labels_3d'] = np.zeros(0, dtype=np.int64) if self.load_type in ['fov_image_based', 'mv_image_based']: ann_info['gt_bboxes'] = np.zeros((0, 4), dtype=np.float32) ann_info['gt_bboxes_labels'] = np.array(0, dtype=np.int64) ann_info['centers_2d'] = np.zeros((0, 2), dtype=np.float32) ann_info['depths'] = np.zeros((0), dtype=np.float32) ann_info = self._remove_dontcare(ann_info) # in kitti, lidar2cam = R0_rect @ Tr_velo_to_cam lidar2cam = np.array(info['images']['CAM2']['lidar2cam']) # convert gt_bboxes_3d to velodyne coordinates with `lidar2cam` gt_bboxes_3d = CameraInstance3DBoxes( ann_info['gt_bboxes_3d']).convert_to(self.box_mode_3d, np.linalg.inv(lidar2cam)) ann_info['gt_bboxes_3d'] = gt_bboxes_3d return ann_info
self.modality['use_lidar'] = True
,所以进入第一个 if 选择结构;原始数据 info 中使用了 plane
,所以进入第二个 if 选择结构;info['plane']
转换成numpy数组 plane
;info['images']['CAM2']['lidar2cam']
转换成numpy数组 lidar2cam
;lidar2cam
进行矩阵求逆得到numpy数组 reverse
;plane
得到两个新的numpy数组 plane_norm_cam
和 plane_off_cam
;reverse
的
R
3
×
3
R^{3\times3}
R3×3与numpy数组 plane_norm_cam
的
R
1
×
3
R^{1\times3}
R1×3 矩阵相乘得到一个numpy数组
R
3
×
1
R^{3\times1}
R3×1,然后再通过索引 [:, 0]取出第0列的数据组成一个新的numpy数组 plane_norm_lidar
;reverse
的
R
3
×
3
R^{3\times3}
R3×3与numpy数组 plane_off_cam
的
R
1
×
3
R^{1\times3}
R1×3 矩阵相乘得到一个numpy数组
R
3
×
1
R^{3\times1}
R3×1,然后再通过索引 [:, 0]取出第0列的数据组成一个新的numpy数组,再加上numpy数组 reverse
取出第3列的前3个组成的数组,最终得到一个新的numpy数组 plane_off_lidar
;plane_norm_lidar
相同的全 0 的numpy数组 plane_lidar
;plane_norm_lidar
R
1
×
3
R^{1\times3}
R1×3来设置 plane_lidar
的前3个值;plane_off_lidar
转置矩阵
R
1
×
3
R^{1\times3}
R1×3 与 plane_norm_lidar
R
1
×
3
R^{1\times3}
R1×3进行矩阵相乘得到的
R
1
×
1
R^{1\times1}
R1×1来设置 plane_lidar
的第4个值;原始数据 info 中没有使用了 plane
,所以进入 else 选择结构,将 plane_lidar
设置为 None;plane_lidar
来重新设置 info['plane']
;self.load_type = 'frame_based'
,if 条件不成立,所以跳过后面的语句;super().parse_data_info(info)
初始化 info 的其它数据;parse_data_info()
函数:基类 Det3DDataset
的实现代码在 mmdetection3d/mmdet3d/datasets/dense_heads/kitti_dataset.py
# Copyright (c) OpenMMLab. All rights reserved. import copy import os from os import path as osp from typing import Callable, List, Optional, Set, Union import numpy as np import torch from mmengine.dataset import BaseDataset from mmengine.logging import print_log from terminaltables import AsciiTable from mmdet3d.registry import DATASETS from mmdet3d.structures import get_box_type @DATASETS.register_module() class Det3DDataset(BaseDataset): """Base Class of 3D dataset. This is the base dataset of SUNRGB-D, ScanNet, nuScenes, and KITTI dataset. # TODO: doc link here for the standard data format Args: data_root (str, optional): The root directory for ``data_prefix`` and ``ann_file``. Defaults to None. ann_file (str): Annotation file path. Defaults to ''. metainfo (dict, optional): Meta information for dataset, such as class information. Defaults to None. data_prefix (dict): Prefix for training data. Defaults to dict(pts='velodyne', img=''). pipeline (List[dict]): Pipeline used for data processing. Defaults to []. modality (dict): Modality to specify the sensor data used as input, it usually has following keys: - use_camera: bool - use_lidar: bool Defaults to dict(use_lidar=True, use_camera=False). default_cam_key (str, optional): The default camera name adopted. Defaults to None. box_type_3d (str): Type of 3D box of this dataset. Based on the `box_type_3d`, the dataset will encapsulate the box to its original format then converted them to `box_type_3d`. Defaults to 'LiDAR' in this dataset. Available options includes: - 'LiDAR': Box in LiDAR coordinates, usually for outdoor point cloud 3d detection. - 'Depth': Box in depth coordinates, usually for indoor point cloud 3d detection. - 'Camera': Box in camera coordinates, usually for vision-based 3d detection. filter_empty_gt (bool): Whether to filter the data with empty GT. If it's set to be True, the example with empty annotations after data pipeline will be dropped and a random example will be chosen in `__getitem__`. Defaults to True. test_mode (bool): Whether the dataset is in test mode. Defaults to False. load_eval_anns (bool): Whether to load annotations in test_mode, the annotation will be save in `eval_ann_infos`, which can be used in Evaluator. Defaults to True. backend_args (dict, optional): Arguments to instantiate the corresponding backend. Defaults to None. show_ins_var (bool): For debug purpose. Whether to show variation of the number of instances before and after through pipeline. Defaults to False. """ def __init__(self, data_root: Optional[str] = None, ann_file: str = '', metainfo: Optional[dict] = None, data_prefix: dict = dict(pts='velodyne', img=''), pipeline: List[Union[dict, Callable]] = [], modality: dict = dict(use_lidar=True, use_camera=False), default_cam_key: str = None, box_type_3d: dict = 'LiDAR', filter_empty_gt: bool = True, test_mode: bool = False, load_eval_anns: bool = True, backend_args: Optional[dict] = None, show_ins_var: bool = False, **kwargs) -> None: self.backend_args = backend_args self.filter_empty_gt = filter_empty_gt self.load_eval_anns = load_eval_anns _default_modality_keys = ('use_lidar', 'use_camera') if modality is None: modality = dict() # Defaults to False if not specify for key in _default_modality_keys: if key not in modality: modality[key] = False self.modality = modality self.default_cam_key = default_cam_key assert self.modality['use_lidar'] or self.modality['use_camera'], ( 'Please specify the `modality` (`use_lidar` ' f', `use_camera`) for {self.__class__.__name__}') self.box_type_3d, self.box_mode_3d = get_box_type(box_type_3d) if metainfo is not None and 'classes' in metainfo: # we allow to train on subset of self.METAINFO['classes'] # map unselected labels to -1 self.label_mapping = { i: -1 for i in range(len(self.METAINFO['classes'])) } self.label_mapping[-1] = -1 for label_idx, name in enumerate(metainfo['classes']): ori_label = self.METAINFO['classes'].index(name) self.label_mapping[ori_label] = label_idx self.num_ins_per_cat = {name: 0 for name in metainfo['classes']} else: self.label_mapping = { i: i for i in range(len(self.METAINFO['classes'])) } self.label_mapping[-1] = -1 self.num_ins_per_cat = { name: 0 for name in self.METAINFO['classes'] } super().__init__( ann_file=ann_file, metainfo=metainfo, data_root=data_root, data_prefix=data_prefix, pipeline=pipeline, test_mode=test_mode, **kwargs) # can be accessed by other component in runner self.metainfo['box_type_3d'] = box_type_3d self.metainfo['label_mapping'] = self.label_mapping # used for showing variation of the number of instances before and # after through the pipeline self.show_ins_var = show_ins_var # show statistics of this dataset print_log('-' * 30, 'current') print_log(f'The length of the dataset: {len(self)}', 'current') content_show = [['category', 'number']] for cat_name, num in self.num_ins_per_cat.items(): content_show.append([cat_name, num]) table = AsciiTable(content_show) print_log( f'The number of instances per category in the dataset:\n{table.table}', # noqa: E501 'current') def _remove_dontcare(self, ann_info: dict) -> dict: """Remove annotations that do not need to be cared. -1 indicates dontcare in MMDet3d. Args: ann_info (dict): Dict of annotation infos. The instance with label `-1` will be removed. Returns: dict: Annotations after filtering. """ img_filtered_annotations = {} filter_mask = ann_info['gt_labels_3d'] > -1 for key in ann_info.keys(): if key != 'instances': img_filtered_annotations[key] = (ann_info[key][filter_mask]) else: img_filtered_annotations[key] = ann_info[key] return img_filtered_annotations def get_ann_info(self, index: int) -> dict: """Get annotation info according to the given index. Use index to get the corresponding annotations, thus the evalhook could use this api. Args: index (int): Index of the annotation data to get. Returns: dict: Annotation information. """ data_info = self.get_data_info(index) # test model if 'ann_info' not in data_info: ann_info = self.parse_ann_info(data_info) else: ann_info = data_info['ann_info'] return ann_info def parse_ann_info(self, info: dict) -> Union[dict, None]: """Process the `instances` in data info to `ann_info`. In `Custom3DDataset`, we simply concatenate all the field in `instances` to `np.ndarray`, you can do the specific process in subclass. You have to convert `gt_bboxes_3d` to different coordinates according to the task. Args: info (dict): Info dict. Returns: dict or None: Processed `ann_info`. """ # add s or gt prefix for most keys after concat # we only process 3d annotations here, the corresponding # 2d annotation process is in the `LoadAnnotations3D` # in `transforms` name_mapping = { 'bbox_label_3d': 'gt_labels_3d', 'bbox_label': 'gt_bboxes_labels', 'bbox': 'gt_bboxes', 'bbox_3d': 'gt_bboxes_3d', 'depth': 'depths', 'center_2d': 'centers_2d', 'attr_label': 'attr_labels', 'velocity': 'velocities', } instances = info['instances'] # empty gt if len(instances) == 0: return None else: keys = list(instances[0].keys()) ann_info = dict() for ann_name in keys: temp_anns = [item[ann_name] for item in instances] # map the original dataset label to training label if 'label' in ann_name and ann_name != 'attr_label': temp_anns = [ self.label_mapping[item] for item in temp_anns ] if ann_name in name_mapping: mapped_ann_name = name_mapping[ann_name] else: mapped_ann_name = ann_name if 'label' in ann_name: temp_anns = np.array(temp_anns).astype(np.int64) elif ann_name in name_mapping: temp_anns = np.array(temp_anns).astype(np.float32) else: temp_anns = np.array(temp_anns) ann_info[mapped_ann_name] = temp_anns ann_info['instances'] = info['instances'] for label in ann_info['gt_labels_3d']: if label != -1: cat_name = self.metainfo['classes'][label] self.num_ins_per_cat[cat_name] += 1 return ann_info def parse_data_info(self, info: dict) -> dict: """Process the raw data info. Convert all relative path of needed modality data file to the absolute path. And process the `instances` field to `ann_info` in training stage. Args: info (dict): Raw info dict. Returns: dict: Has `ann_info` in training stage. And all path has been converted to absolute path. """ if self.modality['use_lidar']: info['lidar_points']['lidar_path'] = \ osp.join( self.data_prefix.get('pts', ''), info['lidar_points']['lidar_path']) info['num_pts_feats'] = info['lidar_points']['num_pts_feats'] info['lidar_path'] = info['lidar_points']['lidar_path'] if 'lidar_sweeps' in info: for sweep in info['lidar_sweeps']: file_suffix = sweep['lidar_points']['lidar_path'].split( os.sep)[-1] if 'samples' in sweep['lidar_points']['lidar_path']: sweep['lidar_points']['lidar_path'] = osp.join( self.data_prefix['pts'], file_suffix) else: sweep['lidar_points']['lidar_path'] = osp.join( self.data_prefix['sweeps'], file_suffix) if self.modality['use_camera']: for cam_id, img_info in info['images'].items(): if 'img_path' in img_info: if cam_id in self.data_prefix: cam_prefix = self.data_prefix[cam_id] else: cam_prefix = self.data_prefix.get('img', '') img_info['img_path'] = osp.join(cam_prefix, img_info['img_path']) if self.default_cam_key is not None: info['img_path'] = info['images'][ self.default_cam_key]['img_path'] if 'lidar2cam' in info['images'][self.default_cam_key]: info['lidar2cam'] = np.array( info['images'][self.default_cam_key]['lidar2cam']) if 'cam2img' in info['images'][self.default_cam_key]: info['cam2img'] = np.array( info['images'][self.default_cam_key]['cam2img']) if 'lidar2img' in info['images'][self.default_cam_key]: info['lidar2img'] = np.array( info['images'][self.default_cam_key]['lidar2img']) else: info['lidar2img'] = info['cam2img'] @ info['lidar2cam'] if not self.test_mode: # used in training info['ann_info'] = self.parse_ann_info(info) if self.test_mode and self.load_eval_anns: info['eval_ann_info'] = self.parse_ann_info(info) return info def _show_ins_var(self, old_labels: np.ndarray, new_labels: torch.Tensor) -> None: """Show variation of the number of instances before and after through the pipeline. Args: old_labels (np.ndarray): The labels before through the pipeline. new_labels (torch.Tensor): The labels after through the pipeline. """ ori_num_per_cat = dict() for label in old_labels: if label != -1: cat_name = self.metainfo['classes'][label] ori_num_per_cat[cat_name] = ori_num_per_cat.get(cat_name, 0) + 1 new_num_per_cat = dict() for label in new_labels: if label != -1: cat_name = self.metainfo['classes'][label] new_num_per_cat[cat_name] = new_num_per_cat.get(cat_name, 0) + 1 content_show = [['category', 'new number', 'ori number']] for cat_name, num in ori_num_per_cat.items(): new_num = new_num_per_cat.get(cat_name, 0) content_show.append([cat_name, new_num, num]) table = AsciiTable(content_show) print_log( 'The number of instances per category after and before ' f'through pipeline:\n{table.table}', 'current') def prepare_data(self, index: int) -> Union[dict, None]: """Data preparation for both training and testing stage. Called by `__getitem__` of dataset. Args: index (int): Index for accessing the target data. Returns: dict or None: Data dict of the corresponding index. """ ori_input_dict = self.get_data_info(index) # deepcopy here to avoid inplace modification in pipeline. input_dict = copy.deepcopy(ori_input_dict) # box_type_3d (str): 3D box type. input_dict['box_type_3d'] = self.box_type_3d # box_mode_3d (str): 3D box mode. input_dict['box_mode_3d'] = self.box_mode_3d # pre-pipline return None to random another in `__getitem__` if not self.test_mode and self.filter_empty_gt: if len(input_dict['ann_info']['gt_labels_3d']) == 0: return None example = self.pipeline(input_dict) if not self.test_mode and self.filter_empty_gt: # after pipeline drop the example with empty annotations # return None to random another in `__getitem__` if example is None or len( example['data_samples'].gt_instances_3d.labels_3d) == 0: return None if self.show_ins_var: if 'ann_info' in ori_input_dict: self._show_ins_var( ori_input_dict['ann_info']['gt_labels_3d'], example['data_samples'].gt_instances_3d.labels_3d) else: print_log( "'ann_info' is not in the input dict. It's probably that " 'the data is not in training mode', 'current', level=30) return example def get_cat_ids(self, idx: int) -> Set[int]: """Get category ids by index. Dataset wrapped by ClassBalancedDataset must implement this method. The ``CBGSDataset`` or ``ClassBalancedDataset``requires a subclass which implements this method. Args: idx (int): The index of data. Returns: set[int]: All categories in the sample of specified index. """ info = self.get_data_info(idx) gt_labels = info['ann_info']['gt_labels_3d'].tolist() return set(gt_labels)
self.modality['use_lidar'] = True
,所有进入第一个 if 选择结构;info['lidar_points']['lidar_path']
设置为 'data/kitti/training/velodyne_reduced/000000.bin'
;
info['lidar_points']['lidar_path']
设置之前的值info['lidar_points']['lidar_path']
设置之后的值info['num_pts_feats'] = 4
;info['lidar_path'] = 'data/kitti/training/velodyne_reduced/000000.bin'
;info中没有字段 'lidar_sweeps'
,if 条件不成立,所以跳过后面的语句;self.modality['use_camera'] = False'
,if 条件不成立,所以跳过后面的语句;self.test_mode= False'
,if not 条件成立,所以进入后面的语句;parse_ann_info(info)
初始化 info[‘ann_info’];parse_ann_info()
函数:super().parse_data_info(info)
初始化 ann_info;parse_ann_info()
函数:第217行代码,设置一个字典 name_mapping
;
第227行代码,通过 info['instances'] 设置 instances
,info[‘instances’]是一个字典组成的列表,每个字典包含单个实例的所有标注信息。
第229行代码,len(instances) 不等于 0
,if 条件不成立,所以跳过 if 后面的语句,进入 else 后面的语句;
第232行代码,通过 instances[0].key() 取出键来组成列表来设置 keys
;
第233行代码,将 ann_info 设置为空字典
;
第234行代码,进入 for 循环,这里以 ann_name = 'bbox'
进行举例,其它的字段以此类推;
第235行代码,通过 列表推导式取出来的 instances[0]['bbox'] 对应的值来设置 temp_anns
第237行代码,ann_name='bbox' 中没有 label
,if 条件不成立,所以跳过 if 后面的语句;
第241行代码,name_mapping 中包含 ann_name='bbox'
,if 条件成立,所以进入 if 后面的语句;
第242行代码,通过 name_mapping[ann_name] = name_mapping['bbox'] = 'gt_bboxes' 来设置 mapped_ann_name
;
第246行代码,ann_name='bbox' 中没有 label
,if 条件不成立,所以跳过 if 后面的语句;
第248行代码,name_mapping 中包含 ann_name='bbox'
,if 条件成立,所以进入 if 后面的语句;
第249行代码,通过 np.array 将 temp_anns 设置成类型为 np.float32 的 numpy 数组
;
第253行代码,在字典 ann_info 中添加字段 ann_info[mapped_ann_name]=ann_info['gt_bboxes']=temp_anns
;
第234行代码,遍历完整个 for 循环后,最终得到的 ann_info
如下图所示;
第254行代码,在 字典ann_info 中添加一个字段 ann_info['instances']
;
第256行代码,进入 for 循环,遍历这里的 ann_info['gt_labels_3d']=[0]
;
第257行代码,label=0
,if 条件成立,所以进入 if 后面的语句;
第258行代码,通过 self.metainfo['classes'][label]='Pedestrian' 来设置 cat_name
;
第259行代码,将 self.num_ins_per_cat[cat_name] 对应的值加1
,更新前后的值如下图所示;
第256行代码,遍历完整个 for 循环,由于这里第一张图片中只有一个人存在,因此重要 self.num_ins_per_cat['Pedestrian'] 对应的值加1
;
第261行代码,返回得到的 ann_info 给前面调用的函数中;
parse_ann_info()
函数:第150行代码,调用基类 Det3DDataset 的 super().parse_data_info(info)
初始化 ann_info,得到的 ann_info 如下图所示;
第151行代码,ann_info is not None
,if 条件不成立,所以跳过 if 后面的语句;
第163行代码,经过 继承来自基类 Det3DDataset 的 _remove_dontcare() 函数处理一下 ann_info
, 去除一些不关心的参数,这里 ann_info经过这个函数并没有发生改变;
第165行代码,将列表 info['images']['CAM2']['lidar2cam']
转换成numpy数组 lidar2cam
;
第167行代码,通过 ann_info['gt_bboxes_3d'] 构建一个 3D 物体框 Box
,关于 MMDetection3D 的核心组件分析之坐标系和 Box
可以参考官方这篇文章写的教程说明;
第170行代码,将 ann_info['gt_bboxes_3d'] 设置为得到的 3D 物体框 gt_bboxes_3d
;
第171行代码,返回得到的 ann_info 给前面调用的函数中;
parse_data_info()
函数:第323行代码,调用派生类 KittiDataset 的 parse_ann_info(info)
初始化 info[‘ann_info’],得到的 info[‘ann_info’] 如下图所示;
第324行代码,self.test_mode= False'
,if 条件不成立,所以跳过后面的语句;
第327行代码,返回得到的 info 给前面调用的函数中;
parse_data_info()
函数:super().parse_data_info(info)
初始化 info 的其它数据,最终得到的 info 如下图所示;上面的7个步骤最终得到的是 'data/kitti/training/velodyne_reduced/000000.bin'
这一个点云的数据信息 info,后续通过循环遍历得到的所有点云的数据信息进行组合得到的 data_list 如下图所示;
其中每一个点云的数据信息便是前面所得到 info,我们可以查看其中任意一项处理好的数据信息。
至此,我们的 数据处理篇
完满结束,接下来将进入到我们 PointPillars
的网络结构部分。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。