赞
踩
首先我们来分析一下文章题目:PU-GAN: a Point Cloud Upsampling Adversarial Network
PU即Point Upsampling,也就是本文要做的任务是点云上采样。关于点云上采样的介绍,我在介绍PU-Net的这篇文章中介绍过,可参考:
刘昕宸:细嚼慢咽读论文:点云上采样网络开天辟地PU-Netzhuanlan.zhihu.com
GAN即现在大名鼎鼎的GAN(生成对抗网络),也就是本文使用的网络是GAN,依赖GAN来实现点云的上采样。上采样任务其实也是一种生成式任务,因此很自然地想到可以使用GAN来尝试一下。关于GAN的基本原理介绍,可参考:
刘昕宸:通俗理解GAN(一):把GAN给你讲得明明白白zhuanlan.zhihu.com
上采样的意义我在PU-Net那篇文章中详细介绍过:
点云处理任务存在极大挑战,很重要的一点是点云这种数据形式的稀疏性和不规则性。
而本文要做的上采样任务,正是为了解决点云数据稀疏性这一问题,为下游各种特征学习任务提供更“高质”的数据。
点云上采样任务,简单来说就是输入某一点云,生成保持基本形状的“更稠密”点云。
单就上采样效果而言,之前基于深度学习的方法如PU-Net、MPU在现实场景扫描点云上取得的效果均非常有限。我们来看看PU-GAN论文在开头放的图(Kitti数据集上测试):
点云上采样本质也是一种生成式任务,在视觉领域做生成任务,自然而然地就会想到:不妨试试GAN??
本文的目标就是上采样,也就是给定有N个点的稀疏点集 ,我们期望生成有 rN 个点的稠密点集 .
Q 并不需要是 P 的超集,但是需要满足以下2个条件:
PU-GAN的网络结构图如下所示:
因为是GAN,所以网络分成了Generator和Discriminator两部分。
Generator用于从稀疏点云 P 生成稠密点云 Q .
Discriminator用于区分真实稠密点云和generator生成的点云。
看出来了嘛,其实generator的整体框架还是PU-Net那一套:patch --> feature extraction --> feature expansion --> coordinate reconstruction ;-)
Generator全局代码:
- class Generator(object):
- def __init__(self, opts,is_training, name="Generator"):
- self.opts = opts
- self.is_training = is_training
- self.name = name
- self.reuse = False
- self.num_point = self.opts.patch_num_point
- self.up_ratio = self.opts.up_ratio
- self.up_ratio_real = self.up_ratio + self.opts.more_up
- self.out_num_point = int(self.num_point*self.up_ratio)
-
- def __call__(self, inputs):
- with tf.variable_scope(self.name, reuse=self.reuse):
-
- features = ops.feature_extraction(inputs, scope='feature_extraction', is_training=self.is_training, bn_decay=None)
-
- H = ops.up_projection_unit(features, self.up_ratio_real, scope="up_projection_unit", is_training=self.is_training, bn_decay=None)
-
- coord = ops.conv2d(H, 64, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=self.is_training,
- scope='fc_layer1', bn_decay=None)
-
- coord = ops.conv2d(coord, 3, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=self.is_training,
- scope='fc_layer2', bn_decay=None,
- activation_fn=None, weight_decay=0.0)
- outputs = tf.squeeze(coord, [2])
-
-
-
- outputs = gather_point(outputs, farthest_point_sample(self.out_num_point, outputs))
- self.reuse = True
- self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.name)
- return outputs
Ⅰ Patch Extraction
对每个3D mesh,在表面随机选择200个种子点,对每个种子点根据测地线距离生成patch,并将每个patch normalize到一个unit sphere中。
对每个patch,使用Poisson Disk Sampling生成 ,作为有 rN 个点的目标点云
我们动态地对 随机采样 N 个点,生成输入点云 P .
Ⅱ Feature Extraction
本模块旨在提取point-wise feature:
输入点云 N*d ( d 包括点云的原始数据,坐标、颜色、法向量等, d 一般为 3 ),输出point-wise feature N*C
本模块直接借鉴了论文Patch-based progressive 3D point set upsampling的特征提取方法,使用了dense connection来集成不同层的特征。
网络结构如下,处理过程非常明晰了:
我们再来看看代码加深理解:
- def feature_extraction(inputs, scope='feature_extraction2', is_training=True, bn_decay=None):
- with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
-
- use_bn = False
- use_ibn = False
- growth_rate = 24
-
- dense_n = 3
- knn = 16
- comp = growth_rate*2
- l0_features = tf.expand_dims(inputs, axis=2)
- l0_features = conv2d(l0_features, 24, [1, 1],
- padding='VALID', scope='layer0', is_training=is_training, bn=use_bn, ibn=use_ibn,
- bn_decay=bn_decay, activation_fn=None)
- l0_features = tf.squeeze(l0_features, axis=2)
-
- # encoding layer
- l1_features, l1_idx = dense_conv(l0_features, growth_rate=growth_rate, n=dense_n, k=knn,
- scope="layer1", is_training=is_training, bn=use_bn, ibn=use_ibn,
- bn_decay=bn_decay)
- l1_features = tf.concat([l1_features, l0_features], axis=-1) # (12+24*2)+24=84
-
- l2_features = conv1d(l1_features, comp, 1, # 24
- padding='VALID', scope='layer2_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
- bn_decay=bn_decay)
- l2_features, l2_idx = dense_conv(l2_features, growth_rate=growth_rate, n=dense_n, k=knn,
- scope="layer2", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
- l2_features = tf.concat([l2_features, l1_features], axis=-1) # 84+(24*2+12)=144
-
- l3_features = conv1d(l2_features, comp, 1, # 48
- padding='VALID', scope='layer3_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
- bn_decay=bn_decay) # 48
- l3_features, l3_idx = dense_conv(l3_features, growth_rate=growth_rate, n=dense_n, k=knn,
- scope="layer3", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
- l3_features = tf.concat([l3_features, l2_features], axis=-1) # 144+(24*2+12)=204
-
- l4_features = conv1d(l3_features, comp, 1, # 48
- padding='VALID', scope='layer4_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
- bn_decay=bn_decay) # 48
- l4_features, l3_idx = dense_conv(l4_features, growth_rate=growth_rate, n=dense_n, k=knn,
- scope="layer4", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
- l4_features = tf.concat([l4_features, l3_features], axis=-1) # 204+(24*2+12)=264
-
- l4_features = tf.expand_dims(l4_features, axis=2)
-
- return l4_features
核心dense_conv的实现:
- def dense_conv(feature, n=3,growth_rate=64, k=16, scope='dense_conv',**kwargs):
- with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
- y, idx = get_edge_feature(feature, k=k, idx=None) # [B N K 2*C]
- for i in range(n):
- if i == 0:
- y = tf.concat([
- conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, **kwargs),
- tf.tile(tf.expand_dims(feature, axis=2), [1, 1, k, 1])], axis=-1)
- elif i == n-1:
- y = tf.concat([
- conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, activation_fn=None, **kwargs),
- y], axis=-1)
- else:
- y = tf.concat([
- conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, **kwargs),
- y], axis=-1)
- y = tf.reduce_max(y, axis=-2)
- return y, idx
Ⅲ Feature Expansion
和PU-Net一样,PU-GAN也设计了自己的feature expansion模块,这也应该是上采样算法的核心了吧
PU-Net的做法是 直接复制点的特征,然后使用不同的 MLP来分别独立处理各自的点特征备份。
即使PU-Net使用了诸如 repulsion loss这样的约束,但这种上采样方式仍然会导致 扩展的点特征过于接近彼此,影响上采样质量。
输入point-wise feature N*C ,输出
PU-GAN还设计了up-down-up expansion unit来增强特征扩展的效果,以实现enabling the generator to produce more diverse point distributions.
网络结构图如下所示,还包括了Up-feature operator和Down-feature operator的结构:
看代码是比较明晰的,点特征输入到Up-feature operator生成 ,再输入Down-feature operator将其降采样回.
计算降采样点特征与原输入之间的difference .
输入 到Up-feature operator生成 ,将 作为 的偏移量,得 .
- def up_projection_unit(inputs,up_ratio,scope="up_projection_unit",is_training=True,bn_decay=None):
- with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
- L = conv2d(inputs, 128, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv0', bn_decay=bn_decay)
-
- H0 = up_block(L,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='up_0')
-
- L0 = down_block(H0,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='down_0')
- E0 = L0-L
- H1 = up_block(E0,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='up_1')
- H2 = H0+H1
- return H2
Up-feature operator:
不像PU-Net直接复制,PU-GAN在复制点特征时使用了grid结构(可参考FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation),这等价于在输入点附近增加一些新的点.
整合复制点特征使用了self attention机制
- def up_block(inputs, up_ratio, scope='up_block', is_training=True, bn_decay=None):
- with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
- net = inputs
- dim = inputs.get_shape()[-1]
- out_dim = dim*up_ratio
- grid = gen_grid(up_ratio)
- grid = tf.tile(tf.expand_dims(grid, 0), [tf.shape(net)[0], 1,tf.shape(net)[1]]) # [batch_size, num_point*4, 2])
- grid = tf.reshape(grid, [tf.shape(net)[0], -1, 1, 2])
- #grid = tf.expand_dims(grid, axis=2)
-
- net = tf.tile(net, [1, up_ratio, 1, 1])
- net = tf.concat([net, grid], axis=-1)
-
- net = attention_unit(net, is_training=is_training)
-
- net = conv2d(net, 256, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv1', bn_decay=bn_decay)
- net = conv2d(net, 128, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv2', bn_decay=bn_decay)
-
- return net
1)grid机制
为每个feature-map copy生成一个唯一的2D vector,然后将该2D vector拼接给其对应feature-map copy内的每一个点。
因为该2D vector的存在,因此复制的点特征还是有些细微差别的。
- def gen_grid(up_ratio):
- import math
- """
- output [num_grid_point, 2]
- """
- sqrted = int(math.sqrt(up_ratio))+1
- for i in range(1,sqrted+1).__reversed__():
- if (up_ratio%i) == 0:
- num_x = i
- num_y = up_ratio//i
- break
- grid_x = tf.lin_space(-0.2, 0.2, num_x)
- grid_y = tf.lin_space(-0.2, 0.2, num_y)
-
- x, y = tf.meshgrid(grid_x, grid_y)
- grid = tf.reshape(tf.stack([x, y], axis=-1), [-1, 2]) # [2, 2, 2] -> [4, 2]
- return grid
2)attention机制
- def attention_unit(inputs, scope='attention_unit',is_training=True):
- with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
- dim = inputs.get_shape()[-1].value
- layer = dim//4
- f = conv2d(inputs,layer, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv_f', bn_decay=None)
-
- g = conv2d(inputs, layer, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv_g', bn_decay=None)
-
- h = conv2d(inputs, dim, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv_h', bn_decay=None)
-
-
- s = tf.matmul(hw_flatten(g), hw_flatten(f), transpose_b=True) # # [bs, N, N]
-
- beta = tf.nn.softmax(s, axis=-1) # attention map
-
- o = tf.matmul(beta, hw_flatten(h)) # [bs, N, N]*[bs, N, c]->[bs, N, c]
- gamma = tf.get_variable("gamma", [1], initializer=tf.constant_initializer(0.0))
-
- o = tf.reshape(o, shape=inputs.shape) # [bs, h, w, C]
- x = gamma * o + inputs
-
- return x
Down-feature operator:
Down结构比较简单:
对expanded features降采样,对特征reshape,然后使用一系列MLPs来拟合原特征
- def down_block(inputs,up_ratio,scope='down_block',is_training=True,bn_decay=None):
- with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
- net = inputs
- net = tf.reshape(net,[tf.shape(net)[0],up_ratio,-1,tf.shape(net)[-1]])
- net = tf.transpose(net, [0, 2, 1, 3])
-
- net = conv2d(net, 256, [1, up_ratio],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv1', bn_decay=bn_decay)
- net = conv2d(net, 128, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=is_training,
- scope='conv2', bn_decay=bn_decay)
-
- return net
Ⅳ Coordinate Reconstruction
最后是坐标重建:
- coord = ops.conv2d(H, 64, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=self.is_training,
- scope='fc_layer1', bn_decay=None)
-
- coord = ops.conv2d(coord, 3, [1, 1],
- padding='VALID', stride=[1, 1],
- bn=False, is_training=self.is_training,
- scope='fc_layer2', bn_decay=None,
- activation_fn=None, weight_decay=0.0)
-
- outputs = tf.squeeze(coord, [2])
Discriminator的目标是分辨上采样点云是否是Generator生成的
首先使用一个轻量的网络结构整合local和global信息,提取global feature
另外Discriminator还使用了self-attention unit来enhance the feature integration and improve the subsequent feature extraction capability
最后使用MLP和pooling得到了最后的confidence value,可以理解成是Discriminator判断输入上采样点云是真实上采样点云的可能性。
- class Discriminator(object):
- def __init__(self, opts,is_training, name="Discriminator"):
- self.opts = opts
- self.is_training = is_training
- self.name = name
- self.reuse = False
- self.bn = False
- self.start_number = 32
- #print('start_number:',self.start_number)
-
- def __call__(self, inputs):
- with tf.variable_scope(self.name, reuse=self.reuse):
- inputs = tf.expand_dims(inputs,axis=2)
- with tf.variable_scope('encoder_0', reuse=tf.AUTO_REUSE):
- features = ops.mlp_conv(inputs, [self.start_number, self.start_number * 2])
- features_global = tf.reduce_max(features, axis=1, keep_dims=True, name='maxpool_0')
- features = tf.concat([features, tf.tile(features_global, [1, tf.shape(inputs)[1],1, 1])], axis=-1)
- features = ops.attention_unit(features, is_training=self.is_training)
- with tf.variable_scope('encoder_1', reuse=tf.AUTO_REUSE):
- features = ops.mlp_conv(features, [self.start_number * 4, self.start_number * 8])
- features = tf.reduce_max(features, axis=1, name='maxpool_1')
-
- with tf.variable_scope('decoder', reuse=tf.AUTO_REUSE):
- outputs = ops.mlp(features, [self.start_number * 8, 1])
- outputs = tf.reshape(outputs, [-1, 1])
-
- self.reuse = True
- self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.name)
-
- return outputs
因为GAN的原因,PU-GAN的loss设计得比较多,主要分为了Generator loss和Discriminator loss:
self.D_loss = discriminator_loss(self.D,self.input_y,self.G_y)
discriminator_loss只包括了adversarial loss.
discriminator_loss设计的adversarial loss很简单:
是真实点云, Q 是generator生成的fake点云, D(Q) 表示判别器输出的confidence value.
- def discriminator_loss(D, input_real, input_fake, Ra=False, gan_type='lsgan'):
- real = D(input_real)
- fake = D(input_fake)
- real_loss = tf.reduce_mean(tf.square(real - 1.0))
- fake_loss = tf.reduce_mean(tf.square(fake))
-
- loss = real_loss + fake_loss
-
- return loss
- self.dis_loss = self.opts.fidelity_w * pc_distance(self.G_y, self.input_y, radius=self.pc_radius)
-
- if self.opts.use_repulse:
- self.repulsion_loss = self.opts.repulsion_w*get_repulsion_loss(self.G_y)
- else:
- self.repulsion_loss = 0
-
- self.uniform_loss = self.opts.uniform_w * get_uniform_loss(self.G_y)
- self.pu_loss = self.dis_loss + self.uniform_loss + self.repulsion_loss + tf.losses.get_regularization_loss()
-
- self.G_gan_loss = self.opts.gan_w*generator_loss(self.D,self.G_y)
- self.total_gen_loss = self.G_gan_loss + self.pu_loss
generator loss包括了reconstruction loss,repulsion loss,uniform loss和adversarial loss.
1)adversarial loss
与上面Discriminator的adversarial loss基本类似:
- def generator_loss(D,input_fake):
- fake = D(input_fake)
-
- fake_loss = tf.reduce_mean(tf.square(fake - 1.0))
- return fake_loss
2)reconstruction loss
PU-GAN默认使用的是EMD loss,EMD的详细理解见这篇文章:
刘昕宸:点云距离度量:完全解析EMD距离(Earth Mover's Distance)zhuanlan.zhihu.com
3)repulsion loss
repulsion loss设计来自PU-Net,想详细了解可参考这篇文章:
刘昕宸:细嚼慢咽读论文:点云上采样网络开天辟地PU-Netzhuanlan.zhihu.com
4)uniform loss
PU-GAN这一工作的一大贡献就是设计了uniform loss来控制生成点云分布的均匀性。
首先PU-Net设计了NUC这一评价指标来衡量生成点云的均匀性,但是这一评价忽视了local clutter of points,因此不宜再被采纳。
什么叫“忽视了local clutter of points”呢?
下面三个disk包含了相同数量的点(因此NUC都是一样的),但是它们的均匀程度显然是不同的。造成NUC衡量失效的原因,很大可能是局部点分布均匀程度NUC是无法刻画的。
而这里uniform loss的设计就是同时考虑了global和local!!!
第一项:
对于有 rN 个点的点集 Q (在实验中实际就是1个patch):
step 1. 使用最远点采样(FPS)采样 M 个seed points
step 2. 以每个seed point为球心,使用半径为 的ball query得到point subset .
分析:
严格坐落在 Q 表面面积为 的local disk上。
还记得上面介绍的Ⅰ Patch Extraction嘛?我们通过测地线距离+正则化提取patch,因此patch就已经被我们normalize到一个unit sphere中了,patch的表面积是
因此 内expected percentage of points p 就应该是
并且 内expected number of points 就应该是 rNp 了
自然而然地,遵循chi-square model设计了uniform loss的第一项,用来衡量 与 的偏差:
第二项:
考虑local point clutter,对 中的每个点,找到其最近邻并计算距离 ( k 表示第 中的第 k 个点)
想象一下,如果 是均匀分布的,那么点与点之间的距离分布应该是这样的:
此时expected point-to-neighbor distance
自然而然地,再次遵循chi-square model,设计了uniform loss的第二项,用来衡量 与 的偏差:
因此最终我们可得uniform loss:
程序实现:
- def get_uniform_loss(pcd, percentages=[0.004,0.006,0.008,0.010,0.012], radius=1.0):
- B,N,C = pcd.get_shape().as_list()
- npoint = int(N * 0.05)
- loss=[]
- for p in percentages:
- nsample = int(N*p)
- r = math.sqrt(p*radius)
- disk_area = math.pi *(radius ** 2) * p/nsample
- #print(npoint,nsample)
- new_xyz = gather_point(pcd, farthest_point_sample(npoint, pcd)) # (batch_size, npoint, 3)
- idx, pts_cnt = query_ball_point(r, nsample, pcd, new_xyz)#(batch_size, npoint, nsample)
-
- #expect_len = tf.sqrt(2*disk_area/1.732)#using hexagon
- expect_len = tf.sqrt(disk_area) # using square
-
- grouped_pcd = group_point(pcd, idx)
- grouped_pcd = tf.concat(tf.unstack(grouped_pcd, axis=1), axis=0)
-
- var, _ = knn_point(2, grouped_pcd, grouped_pcd)
- uniform_dis = -var[:, :, 1:]
- uniform_dis = tf.sqrt(tf.abs(uniform_dis+1e-8))
- uniform_dis = tf.reduce_mean(uniform_dis,axis=[-1])
- uniform_dis = tf.square(uniform_dis - expect_len) / (expect_len + 1e-8)
- uniform_dis = tf.reshape(uniform_dis, [-1])
-
- mean, variance = tf.nn.moments(uniform_dis, axes=0)
- mean = mean*math.pow(p*100,2)
- #nothing 4
- loss.append(mean)
- return tf.add_n(loss)/len(percentages)
从PU-Net和MPU的数据集以及Visionair repository中挑选了147个模型,尽可能多地覆盖不同的类型。其中120个模型用于训练,27个用于测试。
训练数据的准备:因为PU-GAN是基于patch的,因此需要先对各个模型提取patch。在每个训练模型上提取200个patch,120个模型就一共能提取24000个patch用于训练,其中每个patch就是一个(input patch, groundtruth patch)的pair,input patch有256个点,groundtruth patch有1024个点。
评价指标包括了4类:
2. 定性比较
3. 真实扫描场景的上采样
针对KITTI数据集:
4. 消融实验
PU-GAN基本延续了PU-Net的思路,但是在诸多细节上均做了非常明显的改进(gridding, self-attention, dense connection, up-down-up, uniform loss等),也确实取得了更加强大的效果。
更值得关注的是,PU-GAN还在真实扫描点云数据集Kitti上做了实验,仍然取得了不错的效果,这一方面既证明了PU-GAN网络的泛化能力,另一方面也说明了PU-GAN具有非常大的实际使用价值,这一点是非常重要的。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。