赞
踩
FCN和U-Net在2015年先后发表,主要思路都是先编码后解码(encoder-decoder),最后得到和原图大小相同的特征图,然后对特征图每个点与图像的标注mask上的每个像素点求损失。它们的区别主要在于特征融合的方式,FCN特征融合采用特征直接相加,而U-Net特征融合采用的是两个特征在通道维度的堆叠。本文分别采用tensorflow和pytorch复现了FCN和U-Net。
github上发现了一个语义分割代码合集:https://github.com/mrgloom/awesome-semantic-segmentation
FCN主要介绍FCN-8S,FCN论文中共构建了FCN32s、FCN-16s、FCN-8s三种网络结构,其中,FCN32s没有融合浅层特征,直接对深层特征进行上采样;FCN-16s融合了一层的浅层特征;而FCN-8S融合了两层的浅层特征,分割效果最好。
FCN-8s编码网络采用全卷积版的VGG-16,将最后的VGG原来的全连接层改为了卷积层。这里采用tf1.x框架编写代码,使用tf高阶API tf.layers搭建网络,这儿只贴出网络结构部分。
FCN原caffe项目地址:https://github.com/shelhamer/fcn.berkeleyvision.org
def encode(input): # 全卷积版的VGG conv1_1 = tf.layers.conv2d(input, 64, 3, padding= 'SAME', activation=tf.nn.relu) conv1_2 = tf.layers.conv2d(conv1_1, 64, 3, padding= 'SAME', activation=tf.nn.relu) pool1 = tf.layers.max_pooling2d(conv1_2, pool_size=[2, 2], strides=2) conv2_1 = tf.layers.conv2d(pool1, 128, 3, padding= 'SAME', activation=tf.nn.relu) conv2_2 = tf.layers.conv2d(conv2_1, 128, 3, padding= 'SAME', activation=tf.nn.relu) pool2 = tf.layers.max_pooling2d(conv2_2, pool_size=[2, 2], strides=2) conv3_1 = tf.layers.conv2d(pool2, 256, 3, padding= 'SAME', activation=tf.nn.relu) conv3_2 = tf.layers.conv2d(conv3_1, 256, 3, padding= 'SAME', activation=tf.nn.relu) pool3 = tf.layers.max_pooling2d(conv3_2, pool_size=[2, 2], strides=2) conv4_1 = tf.layers.conv2d(pool3, 512, 3, padding= 'SAME', activation=tf.nn.relu) conv4_2 = tf.layers.conv2d(conv4_1, 512, 3, padding= 'SAME', activation=tf.nn.relu) pool4 = tf.layers.max_pooling2d(conv4_2, pool_size=[2, 2], strides=2) conv5_1 = tf.layers.conv2d(pool4, 512, 3, padding= 'SAME', activation=tf.nn.relu) conv5_2 = tf.layers.conv2d(conv5_1, 512, 3, padding= 'SAME', activation=tf.nn.relu) pool5 = tf.layers.max_pooling2d(conv5_2, pool_size=[2, 2], strides=2) fc6 = tf.layers.conv2d(pool5, 4096, 1, padding= 'valid', activation=tf.nn.relu) #卷积核为4和得到的特征图大小相同(图像128*128输入时) tf.layers.dropout(fc6, 0.5) fc7 = tf.layers.conv2d(fc6, 4096, 1, padding= 'valid', activation=tf.nn.relu) tf.layers.dropout(fc7, 0.5) return pool3, pool4, fc7 def decode(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes): """ Create the layers for a fully convolutional network. Build skip-layers using the vgg layers. :param vgg_layer7_out: TF Tensor for VGG Layer 3 output :param vgg_layer4_out: TF Tensor for VGG Layer 4 output :param vgg_layer3_out: TF Tensor for VGG Layer 7 output :param num_classes: Number of classes to classify :return: The Tensor for the last layer of output """ # 感觉这个卷积也可以不用 layer7_conv = tf.layers.conv2d(vgg_layer7_out, num_classes, 1, padding= 'SAME', kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3)) layer7_conv = vgg_layer7_out layer7_trans = tf.layers.conv2d_transpose(layer7_conv, num_classes, 4, 2, padding= 'SAME', kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3)) # 这儿的卷积是为了与后面的反卷集特征融合,特征要保持相同的维度,要把VGG得到的特征维度变为num_classes layer4_conv = tf.layers.conv2d(vgg_layer4_out, num_classes, 1, padding= 'SAME', kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3)) layer4_out = tf.add(layer7_trans, layer4_conv) layer4_trans = tf.layers.conv2d_transpose(layer4_out, num_classes, 4, 2, padding= 'SAME', kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3)) layer3_conv = tf.layers.conv2d(vgg_layer3_out, num_classes, 1, padding= 'SAME', kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3)) layer3_out = tf.add(layer3_conv, layer4_trans) last_layer = tf.layers.conv2d_transpose(layer3_out, num_classes, 16, 8, padding= 'SAME', kernel_regularizer= tf.contrib.layers.l2_regularizer(1e-3), name = "last_layer") return last_layer
FCN参考:
https://blog.csdn.net/m0_37862527/article/details/79843963
https://blog.csdn.net/weixin_40519315/article/details/104412740
https://zhuanlan.zhihu.com/p/62995971?utm_source=wechat_session
https://blog.csdn.net/qq_36269513/article/details/80420363
https://blog.csdn.net/u013303599/article/details/79231503
github:
https://github.com/shelhamer/fcn.berkeleyvision.org
https://github.com/pierluigiferrari/fcn8s_tensorflow
https://github.com/MarvinTeichmann/tensorflow-fcn/blob/master/fcn8_vgg.py
https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/models/fcn8s.py
U-Net特征融合采用拼接的形式,网络结构看起来是一个对称的U型结构,但是注意这个网络的输入输出大小是不同的,输入是572×572,而输出是388×388,原因是在卷积时没有用padding,每次卷积后特征图尺寸都会变小。一般的分割网络需要输入的大小和输出大小相同,这样才能判断输入图像上每个像素点的类别。对于输入和输出大小不同这个问题,U-Nnet办法是,在输入的时候对待分割的区域进行扩大,用一张更大的图片来预测中央小区域的分割结果,相当于在输入的时候结合了目标区域的上下文信息,缺失区域采用重叠-切片(overlap-tile)方法补充。
下图为重叠-切片策略示意图。预测黄色方框中的分割需要蓝色方框中的图像数据作为输入,缺失的数据由镜像推断。下图白色方框内是原始输入图像,边缘区域是由原图镜像产生的,然后根据蓝色方框的图像作为输入,预测出来黄色方框的分割图(一个区域一个区域地分割是因为一张图片过大显存装不下)。
U-Net结构比较对称,左边网络卷积加池化进行下采样,右边网络卷积加反卷积进行上采样,最终将特征恢复到原图大小,U-Net这张网络结构图表达地十分清晰,可以对着这张网络结构图写代码构建网络。采用的pytorch复现的网络,复现的代码有两个地方和原论文不同:
(1)这儿代码中,卷积使用了padding,而U-Net原论文卷积没有使用padding,因此每次卷积后长宽像素点都减少2。若要完全按照U-Net的网络来构建,在做特征堆叠的时候,要先对左边卷积后的特征尺寸进行缩放,到尺寸和右边相同,然后再做拼接,pytorch中特征缩放可以用torch.nn.functional.upsample()进行插值。有人说连续对feature map加padding卷积,会使得padding进来的feature误差越来越大,因为越卷积,feature的抽象程度越高,就更容易受到padding的影响。
(2)原文中没有用batch normalization,这儿的代码加上了。
class convBlock(nn.Module): def __init__(self, in_channels, out_channels): super(convBlock, self).__init__() self.cnn = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), # 原论文没有说用batch normalization nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(), # nn.MaxPool2d(kernel_size=2, stride=2) ) def forward(self, x): x = self.cnn(x) return x class upSampling(nn.Module): def __init__(self, in_channels, middle_channels, out_channels): super(upSampling, self).__init__() self.cnn = nn.Sequential( nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1), nn.BatchNorm2d(middle_channels), nn.ReLU(), nn.Conv2d(middle_channels, middle_channels, kernel_size=3, padding=1), nn.BatchNorm2d(middle_channels), nn.ReLU(), # stride为步长,即为扩大倍数 nn.ConvTranspose2d(middle_channels, out_channels, kernel_size=2, stride=2) ) def forward(self, x): x = self.cnn(x) return x class uNet(nn.Module): def __init__(self, num_classes): super(uNet, self).__init__() self.enCode1 = convBlock(in_channels=3, out_channels=64) self.enCode2 = convBlock(in_channels=64, out_channels=128) self.enCode3 = convBlock(in_channels=128, out_channels=256) self.enCode4 = convBlock(in_channels=256, out_channels=512) self.Maxpool = nn.MaxPool2d(kernel_size=2, stride=2) self.deCode1 = upSampling(in_channels=512, middle_channels=1024, out_channels=512) self.deCode2 = upSampling(in_channels=1024, middle_channels=512, out_channels=256) self.deCode3 = upSampling(in_channels=512, middle_channels=256, out_channels=128) self.deCode4 = upSampling(in_channels=256, middle_channels=128, out_channels=64) self.lastLayer = nn.Sequential( nn.Conv2d(128, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.Conv2d(64, num_classes, kernel_size=1) # 输出维度为标注的种类 ) def forward(self, x): enc1 = self.enCode1(x) enc1_pool = self.Maxpool(enc1) enc2 = self.enCode2(enc1_pool) enc2_pool = self.Maxpool(enc2) enc3 = self.enCode3(enc2_pool) enc3_pool = self.Maxpool(enc3) enc4 = self.enCode4(enc3_pool) enc4_pool = self.Maxpool(enc4) dec1 = self.deCode1(enc4_pool) dec2 = self.deCode2(torch.cat((dec1, enc4), dim=1)) dec3 = self.deCode3(torch.cat((dec2, enc3), dim=1)) dec4 = self.deCode4(torch.cat((dec3, enc2), dim=1)) out = self.lastLayer(torch.cat((dec4, enc1), dim=1)) return out
FCN参考:
https://zhuanlan.zhihu.com/p/31428783
https://zhuanlan.zhihu.com/p/118540575
https://zhuanlan.zhihu.com/p/87593567
https://blog.csdn.net/l2181265/article/details/87735610
https://www.yuque.com/yahei/hey-yahei/segmentation
github:
https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/models/u_net.py
https://github.com/LeeJunHyun/Image_Segmentation
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。