赞
踩
IEEE Transactions on Information Forensics and Security(TIFS)-2019
人脸识别是一种 mainstream biometric authentication method
However, vulnerability to presentation attacks (a.k.a spoofing) limits its usability in unsupervised applications(无人场景)
随着攻击方式的升级(特别是 3D 面具),仅凭 visual spectra alone(RGB) 想要实现一个 reliable Presentation Attack detection(PAD,人脸活检检测器) 是非常具有挑战的,作者觉得 multi channel(多模态)有助于缓解此问题(Tricking a multi-channel system is harder than a visual spectral one. An attacker would have to mimic real facial features across different representations.)
本文,作者开源了 Wide Multi-Channel presentation Attack (WMCA) database——RGB / NIR / Depth / Thermal
提出了 multi-channel CNN (MC-CNN) face presentation attack detection,can detect a variety of 2D and 3D attacks in obfuscation or impersonation settings.
开源多模态数据集(RGB / NIR / Depth / Thermal)WMCA
提出 MC-CNN 来解决多模态人脸活体检测的问题
1)Preprocessing
MTCNN 进行人脸检测
Supervised Descent Method (SDM) 进行人脸关键点定位
face alignment 然后 resize 成 128x128
数据归一化(8-bit形式)
这里强调了以下非 RGB 模态的数据(例如深度图可能是 16bit 的数据流),采用了 Mean absolute Deviation(MAD) 归一化方法使其成 8-bit format
提起归一化,我们可能最先想到的是 Linear normalization (“Max-Min”),也即
x
′
=
x
−
m
i
n
(
x
)
m
a
x
(
x
)
−
m
i
n
(
x
)
x' = \frac{x-min(x)}{max(x)-min(x)}
x′=max(x)−min(x)x−min(x)
还有 Z-Score normalization
x ′ = x − μ σ x' = \frac{x- \mu}{\sigma} x′=σx−μ
MAD 也是某一种,具体得看《Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median》
2)Network architecture
PAD 的数据集往往不大(相比自然数据集的分类 / 人脸识别等),being insufficient to train a deep architecture from scratch
得借助 pre-train 模型(一般是人脸识别)
《Heterogeneous face recognition using domain specific units》这篇文章提出
high-level features of Deep Convolutional Neural Networks trained in visual spectra images are potentially domain independent and can be used to encode faces sensed in different image domains
分开 learn low level feature detectors that are domain specific and share the same set of high level features from the source domain without re-train them
作者沿用了这个思想进行迁移学习
adaptation of lower layers of CNN, instead of adapting the whole network when limited amount of target data is available
基于 LightCNN 网络,设计了如下形式的 PAD
29 layers
灰色的部分都是不训练的,直接 pre-train 迁移过来
loss 函数为 Binary Cross Entropy (BCE)
1)Camera set up for data collection
Intel RealSense SR300 sensor 采集 RGB / NIR / Depth
Seek Thermal Compact PRO sensor 采集热成像图
看见不一样的世界:SEEK Compact Pro手机热成像镜头
采集到的图片如下:
2)Camera integration and calibration
RGB / NIR / Depth 不用担心,一个设备里面的,产品是校准好了的,需要校准的是 RGB / NIR / Depth 与 thermal 之间,使所有模态采集到的信息,时间+空间对应得上
架子,standard optical mounting posts,
标定棋盘:a checkerboard pattern made from materials with different thermal characteristics
加热棋盘使其在热成像摄像头下成像:For the pattern to be visible on the thermal channel, the target was illuminated by high power halogen lamps.(666)
3)Data collection procedure
Session four was dedicated to presentation attacks only.
The masks and mannequins were heated using a blower prior to capture to make the attack more challenging.(我去,这我是没有想到的,直接给自己上难度!!)
record data from the sensors for 10 seconds
4)Presentation attacks
5)数据划分方式
50 frames from each video which are uniformly sampled in the temporal domain
6)评价指标
在活体检测中,通常将攻击视为正样本,而真实人脸视为负样本
在 dev 集上 BPCER = 1% for obtaining the thresholds.
1)Baseline results
不同模态在传统的特征提取分类框架下的表现结果
Score fusion 的方式为各模态的得分先归一化到0~1,a mean fusion is performed to obtain the final PA score.
the addition of multiple channels helps in boosting the performance of PAD systems.,但传统方法还没完全挖掘出多模态融合的潜力
ps:对于 PAD 系统来说,BPCER 不要太低就可以,重点是 APCER 一定要低
2)Results with MC-CNN
比传统方法猛一些
这个图为啥 MC-CNN 和 FASNet 短了一截呢?作者给出了解释
我们仔细分析一下,图 7 是在不同阈值下统计 APCER 和 BPCER 画出来的
当阈值升高,倾向于什么都判定为攻击,APCER->0,BPCER->1,1-BPCER->0,对应图中曲线左下方向
当阈值降低,倾向于什么都判定为真人,APCER->1,BPCER->0,1-BPCER->1,对应图中曲线右上方向
作者说 CNN 的结果是双峰的,集中分布在 0 和 1 附近,方差较小,eg:都集中在 0 和 0.9 区域,阈值高到一定程度,eg 0.9,APCER 和 BPCER 不变了,就没有曲线了(变成一个点)
下面看看面对不同攻击时候的结果
除了眼镜,都干到了 100%
眼镜弱了些(合理),rigid mask 比 flexible mask 要相对简单些,其他干到了 100%
Similarly, attacks in lower chin could be harder to detect due to variability introduced by bonafide samples with facial hair and so on.
作者进一步引申出,面对 obfuscation 的 PAI 方式,可能比 impersonation PAI 更加困难
1)Experiments with adapting different layers
fine-tune 不同的 layer
fine-tune 的 conv 指的是下图的灰色部分
The performance becomes worse when all layers are adapted. This can be attributed to over-fitting as the number of parameters to learn is very large
看了下代码好像只有 1-9 没有 1-10 哈
https://github.com/AlfredXiangWu/LightCNN/blob/master/light_cnn.py
class network_29layers(nn.Module): def __init__(self, block, layers, num_classes=79077): super(network_29layers, self).__init__() self.conv1 = mfm(1, 48, 5, 1, 2) self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) self.block1 = self._make_layer(block, layers[0], 48, 48) self.group1 = group(48, 96, 3, 1, 1) self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) self.block2 = self._make_layer(block, layers[1], 96, 96) self.group2 = group(96, 192, 3, 1, 1) self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) self.block3 = self._make_layer(block, layers[2], 192, 192) self.group3 = group(192, 128, 3, 1, 1) self.block4 = self._make_layer(block, layers[3], 128, 128) self.group4 = group(128, 128, 3, 1, 1) self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) self.fc = mfm(8*8*128, 256, type=0) self.fc2 = nn.Linear(256, num_classes) def _make_layer(self, block, num_blocks, in_channels, out_channels): layers = [] for i in range(0, num_blocks): layers.append(block(in_channels, out_channels)) return nn.Sequential(*layers) def forward(self, x): x = self.conv1(x) x = self.pool1(x) x = self.block1(x) x = self.group1(x) x = self.pool2(x) x = self.block2(x) x = self.group2(x) x = self.pool3(x) x = self.block3(x) x = self.group3(x) x = self.block4(x) x = self.group4(x) x = self.pool4(x) x = x.view(x.size(0), -1) fc = self.fc(x) fc = F.dropout(fc, training=self.training) out = self.fc2(fc) return out, fc
2)Experiments with different combinations of channels
grandtest protocol.
T > I > D > G
The performance boost in the proposed framework is achieved with the use of multiple channels.
presentation attack(PA),ISO 标准定义 presentation attack is defined as “a presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system”.
presentation attack instrument (PAI),攻击手段,
For example, if we have silicone masks in the training set; then classifying mannequins as an attack is rather easy.
spatially and temporally aligned channels
在anti-spoofing中,在OULU数据集上求APCER,BPCER,ACER上的一个注意事项
人脸的三角片化
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。