赞
踩
标题:TINYCD: A (Not So) Deep Learning Model For Change Detection
paper:
https://arxiv.org/abs/2207.13159
code:
变化检测(CD)的目的是通过比较在不同时间拍摄的该地点的两幅图像来检测在同一区域发生的变化。CD具有挑战性的部分是跟踪用户想要突出显示的变化,例如新建筑,而忽略由于外部因素造成的变化,如环境、照明条件、雾或季节变化。深度学习领域的最新发展使研究人员在这一领域取得了突出的表现。特别是,不同的时空注意机制允许利用从模型中提取的空间特征,并通过利用两个可用的图像以时间方式将它们关联起来。缺点是这些模型变得越来越复杂和庞大,对于边缘应用来说往往是不可行的。当模型必须应用于工业领域或需要实时性能的应用时,这些都是限制。在这项工作中,我们提出了一种新的模型,称为TinyCD,证明了它既轻便又有效,能够在参数减少13-150倍的情况下实现与当前技术水平相当甚至更好的性能。在我们的方法中,我们利用低层特征的重要性来比较图像。要做到这一点,我们只使用几个主干模块。此策略使我们可以将网络参数的数量保持在较低水平。为了合成从两幅图像中提取的特征,我们引入了一种新的、参数经济的混合块,能够在空间和时间域上对特征进行互相关。最后,为了充分利用计算特征中包含的信息,我们定义了能够执行像素分类的PW-MLP块。
变化检测的主要目的是构建一个模型,能够识别场景中两个不同时间之间发生的变化,CD模型比较时间t1和t2获得的两个共同配准的图像。
现有基于深度学习的变化检测模型都过于复杂和庞大,难以应用于工业场景,边缘应用。
我们的工作灵感来自于BIT [【论文笔记】Remote Sensing Image Change Detection with Transformers_m0_61899108的博客-CSDN博客_whu_cd]。事实上,我们认为,每个像素在不同分辨率下包含的信息,即语义标记,对于最终的分类是必不可少的,而且它们比空间注意提供的全局上下文更重要。此外,我们的直觉是,由于我们可以比较两个图像,所以我们只能使用低级别特征来突出及时发生的变化。为此,我们设计了一个暹罗U-Net类型的网络,其中主干由EfficientNetb4的前4个块表示。为了更好地在时空上融合通道中包含的信息,我们引入了一种混合策略,迫使网络以语义一致的方式融合和比较在时间T1和T2提取的特征。最后,为了充分利用每个像素/语义标记中包含的所有信息,我们在每个像素/语义标记上大量应用了多层感知器(MLP)。MLP用于在U-Net结构中创建空间注意掩码跳过连接。并将其应用于分类块中,对每个像素点进行分类。
网络包括4个部分:
详细结构:
WHU-CD: https://study.rsgis.whu.edu.cn/pages/download/building_dataset.html
预处理后的数据集:
LEVIR-CD
wget https://www.dropbox.com/s/h9jl2ygznsaeg5d/LEVIR-CD-256.zip
WHU-CD
wget https://www.dropbox.com/s/r76a00jcxp5d3hl/WHU-CD-256.zip
LEVIR-CD数据集:
WHU-CD数据集:
参数量、计算量、结果比较:
可视化:
是否使用跳接:
不同的生成方式:
https://github.com/AndreaCodegoni/Tiny_model_4_CD
- # change_classifier.py
- # https://github.com/AndreaCodegoni/Tiny_model_4_CD
-
- from typing import List
- import torchvision
- from models.layers import MixingMaskAttentionBlock, PixelwiseLinear, UpMask, MixingBlock
- from torch import Tensor
- from torch.nn import Module, ModuleList, Sigmoid
-
-
- class ChangeClassifier(Module):
- def __init__(
- self,
- bkbn_name="efficientnet_b4",
- pretrained=True,
- output_layer_bkbn="3",
- freeze_backbone=False,
- ):
- super().__init__()
-
- # Load the pretrained backbone according to parameters:
- self._backbone = _get_backbone(
- bkbn_name, pretrained, output_layer_bkbn, freeze_backbone
- )
-
- # Initialize mixing blocks:
- self._first_mix = MixingMaskAttentionBlock(6, 3, [3, 10, 5], [10, 5, 1])
- self._mixing_mask = ModuleList(
- [
- MixingMaskAttentionBlock(48, 24, [24, 12, 6], [12, 6, 1]),
- MixingMaskAttentionBlock(64, 32, [32, 16, 8], [16, 8, 1]),
- MixingBlock(112, 56),
- ]
- )
-
- # Initialize Upsampling blocks:
- self._up = ModuleList(
- [
- UpMask(64, 56, 64),
- UpMask(128, 64, 64),
- UpMask(256, 64, 32),
- ]
- )
-
- # Final classification layer:
- self._classify = PixelwiseLinear([32, 16, 8], [16, 8, 1], Sigmoid())
-
- def forward(self, ref: Tensor, test: Tensor) -> Tensor:
- features = self._encode(ref, test)
- latents = self._decode(features)
- return self._classify(latents)
-
- def _encode(self, ref, test) -> List[Tensor]:
- features = [self._first_mix(ref, test)]
- for num, layer in enumerate(self._backbone):
- ref, test = layer(ref), layer(test)
- if num != 0:
- features.append(self._mixing_mask[num - 1](ref, test))
- return features
-
- def _decode(self, features) -> Tensor:
- upping = features[-1]
- for i, j in enumerate(range(-2, -5, -1)):
- upping = self._up[i](upping, features[j])
- return upping
-
-
- def _get_backbone(
- bkbn_name, pretrained, output_layer_bkbn, freeze_backbone
- ) -> ModuleList:
- # The whole model:
- entire_model = getattr(torchvision.models, bkbn_name)(
- pretrained=pretrained
- ).features
-
- # Slicing it:
- derived_model = ModuleList([])
- for name, layer in entire_model.named_children():
- derived_model.append(layer)
- if name == output_layer_bkbn:
- break
-
- # Freezing the backbone weights:
- if freeze_backbone:
- for param in derived_model.parameters():
- param.requires_grad = False
- return derived_model
-
- # layers.py
-
- from typing import List, Optional
-
- from torch import Tensor, reshape, stack
-
- from torch.nn import (
- Conv2d,
- InstanceNorm2d,
- Module,
- PReLU,
- Sequential,
- Upsample,
- )
-
-
- class PixelwiseLinear(Module):
- def __init__(
- self,
- fin: List[int],
- fout: List[int],
- last_activation: Module = None,
- ) -> None:
- assert len(fout) == len(fin)
- super().__init__()
-
- n = len(fin)
- self._linears = Sequential(
- *[
- Sequential(
- Conv2d(fin[i], fout[i], kernel_size=1, bias=True),
- PReLU()
- if i < n - 1 or last_activation is None
- else last_activation,
- )
- for i in range(n)
- ]
- )
-
- def forward(self, x: Tensor) -> Tensor:
- # Processing the tensor:
- return self._linears(x)
-
-
-
-
- class MixingBlock(Module):
- def __init__(
- self,
- ch_in: int,
- ch_out: int,
- ):
- super().__init__()
- self._convmix = Sequential(
- Conv2d(ch_in, ch_out, 3, groups=ch_out, padding=1),
- PReLU(),
- InstanceNorm2d(ch_out),
- )
-
- def forward(self, x: Tensor, y: Tensor) -> Tensor:
- # Packing the tensors and interleaving the channels:
- mixed = stack((x, y), dim=2)
- mixed = reshape(mixed, (x.shape[0], -1, x.shape[2], x.shape[3]))
-
- # Mixing:
- return self._convmix(mixed)
-
-
- class MixingMaskAttentionBlock(Module):
- """use the grouped convolution to make a sort of attention"""
-
- def __init__(
- self,
- ch_in: int,
- ch_out: int,
- fin: List[int],
- fout: List[int],
- generate_masked: bool = False,
- ):
- super().__init__()
- self._mixing = MixingBlock(ch_in, ch_out)
- self._linear = PixelwiseLinear(fin, fout)
- self._final_normalization = InstanceNorm2d(ch_out) if generate_masked else None
- self._mixing_out = MixingBlock(ch_in, ch_out) if generate_masked else None
-
- def forward(self, x: Tensor, y: Tensor) -> Tensor:
- z_mix = self._mixing(x, y)
- z = self._linear(z_mix)
- z_mix_out = 0 if self._mixing_out is None else self._mixing_out(x, y)
-
- return (
- z
- if self._final_normalization is None
- else self._final_normalization(z_mix_out * z)
- )
-
-
- class UpMask(Module):
- def __init__(
- self,
- up_dimension: int,
- nin: int,
- nout: int,
- ):
- super().__init__()
- self._upsample = Upsample(
- size=(up_dimension, up_dimension), mode="bilinear", align_corners=True
- )
- self._convolution = Sequential(
- Conv2d(nin, nin, 3, 1, groups=nin, padding=1),
- PReLU(),
- InstanceNorm2d(nin),
- Conv2d(nin, nout, kernel_size=1, stride=1),
- PReLU(),
- InstanceNorm2d(nout),
- )
-
- def forward(self, x: Tensor, y: Optional[Tensor] = None) -> Tensor:
- x = self._upsample(x)
- if y is not None:
- x = x * y
- return self._convolution(x)
-
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。