当前位置:   article > 正文

Waymo Open Dataset(WOD)感知数据集数据格式介绍_waymo数据集格式

waymo数据集格式

本文档节选翻译自这里,原文为韩语,使用谷歌翻译译为中文,纯机翻没有经过人工审核,辩证阅读。以下内容是原作者对dataset.proto文件的解释。

数据格式

 每个数据都包含在帧消息中。

 由于框架内的消息也具有依赖性,因此我们将解释从基本消息到框架的所有内容。不过,因为这是面向解释的代码,所以强调插入的值与数据类型根本不匹配。

  1. message Label {
  2. // Upright box, zero pitch and roll.
  3. message Box {
  4. // Box coordinates in vehicle frame.
  5. optional double center_x = 1;
  6. optional double center_y = 2;
  7. optional double center_z = 3;
  8. // Dimensions of the box. length: dim x. width: dim y. height: dim z.
  9. optional double length = 5;
  10. optional double width = 4;
  11. optional double height = 6;
  12. // The heading of the bounding box (in radians). The heading is the angle
  13. // required to rotate +x to the surface normal of the box front face. It is
  14. // normalized to [-pi, pi).
  15. optional double heading = 7;
  16. enum Type {
  17. TYPE_UNKNOWN = 0;
  18. // 7-DOF 3D (a.k.a upright 3D box).
  19. TYPE_3D = 1;
  20. // 5-DOF 2D. Mostly used for laser top down representation.
  21. TYPE_2D = 2;
  22. // Axis aligned 2D. Mostly used for image.
  23. TYPE_AA_2D = 3;
  24. }
  25. }
  26. optional Box box = 1;
  27. message Metadata {
  28. optional double speed_x = 1;
  29. optional double speed_y = 2;
  30. optional double accel_x = 3;
  31. optional double accel_y = 4;
  32. }
  33. optional Metadata metadata = 2;
  34. enum Type {
  35. TYPE_UNKNOWN = 0;
  36. TYPE_VEHICLE = 1;
  37. TYPE_PEDESTRIAN = 2;
  38. TYPE_SIGN = 3;
  39. TYPE_CYCLIST = 4;
  40. }
  41. optional Type type = 3;
  42. // Object ID.
  43. optional string id = 4;
  44. // The difficulty level of this label. The higher the level, the harder it is.
  45. enum DifficultyLevel {
  46. UNKNOWN = 0;
  47. LEVEL_1 = 1;
  48. LEVEL_2 = 2;
  49. }
  50. // Difficulty level for detection problem.
  51. optional DifficultyLevel detection_difficulty_level = 5;
  52. // Difficulty level for tracking problem.
  53. optional DifficultyLevel tracking_difficulty_level = 6;
  54. // The total number of lidar points in this box.
  55. optional int32 num_lidar_points_in_box = 7;
  56. }

 标签是一种表示带注释数据的消息类型。这包括框的位置、框的类型和对象的类型。还有激光雷达点数、识别难度等。

  1. message MatrixShape {
  2. // Dimensions for the Matrix messages defined below. Must not be empty.
  3. //
  4. // The order of entries in 'dims' matters, as it indicates the layout of the
  5. // values in the tensor in-memory representation.
  6. //
  7. // The first entry in 'dims' is the outermost dimension used to lay out the
  8. // values; the last entry is the innermost dimension. This matches the
  9. // in-memory layout of row-major matrices.
  10. repeated int32 dims = 1;
  11. }

 MatrixShape 声明维度的形状。这是因为随着维度的增加,所需的变量数量也会增加。

  1. // Row-major matrix.
  2. // Requires: data.size() = product(shape.dims()).
  3. message MatrixFloat {
  4. repeated float data = 1 [packed = true];
  5. optional MatrixShape shape = 2;
  6. }

 MatrixFloat 将维度的形状声明为浮点数。

  1. // Row-major matrix.
  2. // Requires: data.size() = product(shape.dims()).
  3. message MatrixFloat {
  4. repeated float data = 1 [packed = true];
  5. optional MatrixShape shape = 2;
  6. }

 MatrixInt32 将维度的形状声明为整数。

  1. // Row-major matrix.
  2. // Requires: data.size() = product(shape.dims()).
  3. message MatrixInt32 {
  4. repeated int32 data = 1 [packed = true];
  5. optional MatrixShape shape = 2;
  6. }

 CameraName 是指安装在车辆上的摄像头。

  1. message CameraName {
  2. enum Name {
  3. UNKNOWN = 0;
  4. FRONT = 1;
  5. FRONT_LEFT = 2;
  6. FRONT_RIGHT = 3;
  7. SIDE_LEFT = 4;
  8. SIDE_RIGHT = 5;
  9. }
  10. }

 LaserName 是指安装在车辆上的激光雷达。

  1. // 'Laser' is used interchangeably with 'Lidar' in this file.
  2. message LaserName {
  3. enum Name {
  4. UNKNOWN = 0;
  5. TOP = 1;
  6. FRONT = 2;
  7. SIDE_LEFT = 3;
  8. SIDE_RIGHT = 4;
  9. REAR = 5;
  10. }
  11. }

 变换是一个变化矩阵,用于将 3D 点从一帧变换到另一帧。

  1. message Velocity {
  2. // Velocity in m/s.
  3. optional float v_x = 1;
  4. optional float v_y = 2;
  5. optional float v_z = 3;
  6. // Angular velocity in rad/s.
  7. optional double w_x = 4;
  8. optional double w_y = 5;
  9. optional double w_z = 6;
  10. }

 速度是指物体的速度。

  1. message CameraCalibration {
  2. optional CameraName.Name name = 1;
  3. // 1d Array of [f_u, f_v, c_u, c_v, k{1, 2}, p{1, 2}, k{3}].
  4. // Note that this intrinsic corresponds to the images after scaling.
  5. // Camera model: pinhole camera.
  6. // Lens distortion:
  7. // Radial distortion coefficients: k1, k2, k3.
  8. // Tangential distortion coefficients: p1, p2.
  9. // k_{1, 2, 3}, p_{1, 2} follows the same definition as OpenCV.
  10. // https://en.wikipedia.org/wiki/Distortion_(optics)
  11. // https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
  12. repeated double intrinsic = 2;
  13. // Camera frame to vehicle frame.
  14. optional Transform extrinsic = 3;
  15. // Camera image size.
  16. optional int32 width = 4;
  17. optional int32 height = 5;
  18. enum RollingShutterReadOutDirection {
  19. UNKNOWN = 0;
  20. TOP_TO_BOTTOM = 1;
  21. LEFT_TO_RIGHT = 2;
  22. BOTTOM_TO_TOP = 3;
  23. RIGHT_TO_LEFT = 4;
  24. GLOBAL_SHUTTER = 5;
  25. }
  26. optional RollingShutterReadOutDirection rolling_shutter_direction = 6;
  27. }

 CameraCalibration 在校准相机时显示相机的图像尺寸和内部设置。

  1. message LaserCalibration {
  2. optional LaserName.Name name = 1;
  3. // If non-empty, the beam pitch (in radians) is non-uniform. When constructing
  4. // a range image, this mapping is used to map from beam pitch to range image
  5. // row. If this is empty, we assume a uniform distribution.
  6. repeated double beam_inclinations = 2;
  7. // beam_inclination_{min,max} (in radians) are used to determine the mapping.
  8. optional double beam_inclination_min = 3;
  9. optional double beam_inclination_max = 4;
  10. // Lidar frame to vehicle frame.
  11. optional Transform extrinsic = 5;
  12. }

 LaserCalibration 显示 LiDAR 校准值。

  1. message Context {
  2. // A unique name that identifies the frame sequence.
  3. optional string name = 1;
  4. repeated CameraCalibration camera_calibrations = 2;
  5. repeated LaserCalibration laser_calibrations = 3;
  6. // Some stats for the run segment used.
  7. message Stats {
  8. message ObjectCount {
  9. optional Label.Type type = 1;
  10. // The number of unique objects with the type in the segment.
  11. optional int32 count = 2;
  12. }
  13. repeated ObjectCount laser_object_counts = 1;
  14. repeated ObjectCount camera_object_counts = 5;
  15. // Day, Dawn/Dusk, or Night, determined from sun elevation.
  16. optional string time_of_day = 2;
  17. // Human readable location (e.g. CHD, SF) of the run segment.
  18. optional string location = 3;
  19. // Currently either Sunny or Rain.
  20. optional string weather = 4;
  21. }
  22. optional Stats stats = 4;
  23. }

 上下文表示帧的名称、当时相机和激光雷达的标定值、注释对象的数量、当时的天气和位置等。

  1. message RangeImage {
  2. // Zlib compressed [H, W, 4] serialized version of MatrixFloat.
  3. // To decompress:
  4. // string val = ZlibDecompress(range_image_compressed);
  5. // MatrixFloat range_image;
  6. // range_image.ParseFromString(val);
  7. // Inner dimensions are:
  8. // * channel 0: range
  9. // * channel 1: intensity
  10. // * channel 2: elongation
  11. // * channel 3: is in any no label zone.
  12. optional bytes range_image_compressed = 2;
  13. // Lidar point to camera image projections. A point can be projected to
  14. // multiple camera images. We pick the first two at the following order:
  15. // [FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT].
  16. //
  17. // Zlib compressed [H, W, 6] serialized version of MatrixInt32.
  18. // To decompress:
  19. // string val = ZlibDecompress(camera_projection_compressed);
  20. // MatrixInt32 camera_projection;
  21. // camera_projection.ParseFromString(val);
  22. // Inner dimensions are:
  23. // * channel 0: CameraName.Name of 1st projection. Set to UNKNOWN if no
  24. // projection.
  25. // * channel 1: x (axis along image width)
  26. // * channel 2: y (axis along image height)
  27. // * channel 3: CameraName.Name of 2nd projection. Set to UNKNOWN if no
  28. // projection.
  29. // * channel 4: x (axis along image width)
  30. // * channel 5: y (axis along image height)
  31. // Note: pixel 0 corresponds to the left edge of the first pixel in the image.
  32. optional bytes camera_projection_compressed = 3;
  33. // Zlib compressed [H, W, 6] serialized version of MatrixFloat.
  34. // To decompress:
  35. // string val = ZlibDecompress(range_image_pose_compressed);
  36. // MatrixFloat range_image_pose;
  37. // range_image_pose.ParseFromString(val);
  38. // Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform
  39. // from vehicle frame to global frame for every range image pixel.
  40. // This is ONLY populated for the first return. The second return is assumed
  41. // to have exactly the same range_image_pose_compressed.
  42. //
  43. // The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations,
  44. // meaning that rotating from the navigation to vehicle frame consists of a
  45. // yaw, then pitch and finally roll rotation about the z, y and x axes
  46. // respectively. All rotations use the right hand rule and are positive
  47. // in the counter clockwise direction.
  48. optional bytes range_image_pose_compressed = 4;
  49. // Zlib compressed [H, W, 5] serialized version of MatrixFloat.
  50. // To decompress:
  51. // string val = ZlibDecompress(range_image_flow_compressed);
  52. // MatrixFloat range_image_flow;
  53. // range_image_flow.ParseFromString(val);
  54. // Inner dimensions are [vx, vy, vz, pointwise class].
  55. //
  56. // If the point is not annotated with scene flow information, class is set
  57. // to -1. A point is not annotated if it is in a no-label zone or if its label
  58. // bounding box does not have a corresponding match in the previous frame,
  59. // making it infeasible to estimate the motion of the point.
  60. // Otherwise, (vx, vy, vz) are velocity along (x, y, z)-axis for this point
  61. // and class is set to one of the following values:
  62. // -1: no-flow-label, the point has no flow information.
  63. // 0: unlabeled or "background,", i.e., the point is not contained in a
  64. // bounding box.
  65. // 1: vehicle, i.e., the point corresponds to a vehicle label box.
  66. // 2: pedestrian, i.e., the point corresponds to a pedestrian label box.
  67. // 3: sign, i.e., the point corresponds to a sign label box.
  68. // 4: cyclist, i.e., the point corresponds to a cyclist label box.
  69. optional bytes range_image_flow_compressed = 5;
  70. // Deprecated, do not use.
  71. optional MatrixFloat range_image = 1 [deprecated = true];
  72. }

 RangeImage是指照片所代表的特征。通过针对每个特征调整通道,您可以获得您想要的数据。

  1. // All timestamps in this proto are represented as seconds since Unix epoch.
  2. message CameraImage {
  3. optional CameraName.Name name = 1;
  4. // JPEG image.
  5. optional bytes image = 2;
  6. // SDC pose.
  7. optional Transform pose = 3;
  8. // SDC velocity at 'pose_timestamp' below. The velocity value is represented
  9. // at *global* frame.
  10. // With this velocity, the pose can be extrapolated.
  11. // r(t+dt) = r(t) + dr/dt * dt where dr/dt = v_{x,y,z}.
  12. // dR(t)/dt = W*R(t) where W = SkewSymmetric(w_{x,y,z})
  13. // This differential equation solves to: R(t) = exp(Wt)*R(0) if W is constant.
  14. // When dt is small: R(t+dt) = (I+W*dt)R(t)
  15. // r(t) = (x(t), y(t), z(t)) is vehicle location at t in the global frame.
  16. // R(t) = Rotation Matrix (3x3) from the body frame to the global frame at t.
  17. // SkewSymmetric(x,y,z) is defined as the cross-product matrix in the
  18. // following:
  19. // https://en.wikipedia.org/wiki/Cross_product#Conversion_to_matrix_multiplication
  20. optional Velocity velocity = 4;
  21. // Timestamp of the `pose` above.
  22. optional double pose_timestamp = 5;
  23. // Rolling shutter params.
  24. // The following explanation assumes left->right rolling shutter.
  25. //
  26. // Rolling shutter cameras expose and read the image column by column, offset
  27. // by the read out time for each column. The desired timestamp for each column
  28. // is the middle of the exposure of that column as outlined below for an image
  29. // with 3 columns:
  30. // ------time------>
  31. // |---- exposure col 1----| read |
  32. // -------|---- exposure col 2----| read |
  33. // --------------|---- exposure col 3----| read |
  34. // ^trigger time ^readout end time
  35. // ^time for row 1 (= middle of exposure of row 1)
  36. // ^time image center (= middle of exposure of middle row)
  37. // Shutter duration in seconds. Exposure time per column.
  38. optional double shutter = 6;
  39. // Time when the sensor was triggered and when last readout finished.
  40. // The difference between trigger time and readout done time includes
  41. // the exposure time and the actual sensor readout time.
  42. optional double camera_trigger_time = 7;
  43. optional double camera_readout_done_time = 8;
  44. }

 CameraImage表示照片拍摄的时间和环境、拍摄时的相机操作值等。 

  1. // The camera labels associated with a given camera image. This message
  2. // indicates the ground truth information for the camera image
  3. // recorded by the given camera. If there are no labeled objects in the image,
  4. // then the labels field is empty.
  5. message CameraLabels {
  6. optional CameraName.Name name = 1;
  7. repeated Label labels = 2;
  8. }

 CameraLabels 指示哪些注释存在于哪些相机上。

  1. message Laser {
  2. optional LaserName.Name name = 1;
  3. optional RangeImage ri_return1 = 2;
  4. optional RangeImage ri_return2 = 3;
  5. }

 激光指示哪个 RangeImage 在哪个激光雷达上。

  1. message Frame {
  2. // The following field numbers are reserved for third-party extensions. Users
  3. // may declare new fields in that range in their own .proto files without
  4. // having to edit the original file.
  5. extensions 1000 to max;
  6. // This context is the same for all frames belong to the same driving run
  7. // segment. Use context.name to identify frames belong to the same driving
  8. // segment. We do not store all frames from one driving segment in one proto
  9. // to avoid huge protos.
  10. optional Context context = 1;
  11. // Frame start time, which is the timestamp of the first top lidar spin
  12. // within this frame.
  13. optional int64 timestamp_micros = 2;
  14. // The vehicle pose.
  15. optional Transform pose = 3;
  16. repeated CameraImage images = 4;
  17. repeated Laser lasers = 5;
  18. repeated Label laser_labels = 6;
  19. // Lidar labels (laser_labels) projected to camera images. A projected
  20. // label is the smallest image axis aligned rectangle that can cover all
  21. // projected points from the 3d lidar label. The projected label is ignored if
  22. // the projection is fully outside a camera image. The projected label is
  23. // clamped to the camera image if it is partially outside.
  24. repeated CameraLabels projected_lidar_labels = 9;
  25. // NOTE: if a camera identified by CameraLabels.name has an entry in this
  26. // field, then it has been labeled, even though it is possible that there are
  27. // no labeled objects in the corresponding image, which is identified by a
  28. // zero sized CameraLabels.labels.
  29. repeated CameraLabels camera_labels = 8;
  30. // No label zones in the *global* frame.
  31. repeated Polygon2dProto no_label_zones = 7;
  32. }

 Frame 组合了上面的所有值并将它们显示为一帧。每个值都显示为键值。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/325209
推荐阅读
相关标签
  

闽ICP备14008679号