赞
踩
Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning
github地址:https://github.com/binli123/dsmil-wsi
paper:https://arxiv.org/abs/2011.08939
背景:解决WSI(高分辨率、缺乏局部注释)分类问题。
工作:提出一种基于MIL的WSI分类和病灶检查的方法。
Our method has three major components. First, we introduce a novel MIL aggregator that models the relations of the instances in a dual-stream architecture with trainable distance measurement. Second, since WSIs can produce large or unbalanced bags that hinder the training of MIL models, we propose to use self-supervised contrastive learning to extract good representations for MIL and alleviate the issue of prohibitive memory cost for large bags. Third, we adopt a pyramidal fusion mechanism for multiscale WSI features, and further improve the accuracy of classification and localization.
数据集:TCGA、Camelyon16
弱监督深度MIL模型WSI分类的主要挑战:
When patches (instances) in positive images (bags) are highly unbalanced, i.e., only a small portion of patches are positive, the models are likely to misclassify those positive instances [22] when using a simple aggregation operation, such as the widely adopted max-pooling.
Current models either use fixed patch features extracted by a CNN or only update the feature extractor using a few high score patches, as the end-to-end training of the feature extractor and aggregator is prohibitively expensive for large bags.
包: B = ( x 1 , y 1 ) , ⋯ , ( x n , y n ) B = {(x_1,y_1), \cdots, (x_n,y_n)} B=(x1,y1),⋯,(xn,yn) ,实例:$x_i \in \chi $ , 标签 y i ∈ 0 , 1 y_i \in {0, 1} yi∈0,1 。
KaTeX parse error: Undefined control sequence: \mbox at position 28: …in{cases} 0, & \̲m̲b̲o̲x̲{if } \sum y_i …
进一步,MIL使用一个适当变换
f
f
f 和一个置换不变 (permutation-invariant) 变换
g
g
g 来预测标签
B
B
B 。
c
(
B
)
=
g
(
f
(
x
0
)
,
⋯
,
g
(
f
n
)
)
c(B) = g(f(x_0), \cdots,g(f_n))
c(B)=g(f(x0),⋯,g(fn))
基于
f
f
f 和
g
g
g 有两种不同的MIL建模方式:
Embedding-based 比 Instance-based 精度更高, 但难以确定触发分类器的关键实例。
The embedding-based method produces a bag score based on a bag embedding directly supervised by the bag label and usually yields better accuracy compared to the instance-based method [52], however, it is usually harder to determine the key instances that trigger the classifier [30].
关键创新: 新颖的集合函数 g g g 的设计, 特征提取器 f f f 的学习。
包含用于特征聚合的 masked non-local 块 和 max-pooling 块。
输入: 自监督对比学习得到的特征嵌入。
包:
B
=
x
1
,
⋯
,
x
n
B = {x_1,\cdots,x_n}
B=x1,⋯,xn , 特征提取器
f
f
f , 实例
x
i
x_i
xi 投影到嵌入
h
i
=
f
(
x
i
)
∈
ℜ
L
×
1
h_i = f(x_i) \in \Re^{L \times 1}
hi=f(xi)∈ℜL×1 。
c
(
B
)
=
g
m
(
f
(
x
0
)
,
⋯
,
g
(
f
n
)
)
=
max
{
W
0
h
0
,
⋯
,
W
0
h
N
−
1
}
W
0
W_0
W0 是一个权重向量, max-pooling 是一个置换不变操作,因此这一流满足定义。
转换实例嵌入
h
i
h_i
hi 到两个向量 , query
q
i
∈
ℜ
L
×
1
q_i \in \Re^{L \times 1}
qi∈ℜL×1 和 information $v_i \in \Re^{L \times 1} $。
q
i
=
W
q
H
i
,
v
i
=
W
v
h
i
,
i
=
0
,
⋯
,
N
−
1
q_i = W_qH_i, \ \ \ v_i = W_vh_i, \ \ \ \ i = 0, \cdots, N - 1
qi=WqHi, vi=Wvhi, i=0,⋯,N−1
W
q
W_q
Wq 和
W
v
W_v
Wv 是权重矩阵。
定义距离度量
U
U
U (关键实例到其他实例):
U
(
h
i
,
h
m
)
=
e
x
p
(
⟨
q
i
,
q
m
⟩
)
∑
i
N
−
1
e
x
p
(
⟨
q
k
,
q
m
⟩
)
U(h_i,h_m) = \frac {exp(\left \langle q_i, q_m \right \rangle)} {\sum_i^{N-1} exp(\left \langle q_k, q_m \right \rangle )}
U(hi,hm)=∑iN−1exp(⟨qk,qm⟩)exp(⟨qi,qm⟩)
⟨
,
⟩
\left \langle \ , \ \right \rangle
⟨ , ⟩ 表示两个向量的内积。
包嵌入 b 是所有实例的 信息向量 v i v_i vi 的加权元素和 (weighted element-wise sum),使用到关键实例的距离作为权重。
b = ∑ i N − 1 U ( h i , h m ) v i b = \sum_i^{N-1} U(h_i,h_m)v_i b=i∑N−1U(hi,hm)vi
包分数
c
b
c_b
cb :
c
b
(
B
)
=
g
b
(
f
(
x
i
)
,
⋯
,
f
(
x
n
)
)
=
W
b
∑
i
N
−
1
U
(
h
i
,
h
m
)
v
i
=
W
b
b
W b W_b Wb 是二分类的权重向量 。 这个操作类似于自注意力, 不同在于query-key 匹配只表现在关键节点和其他节点之间。
点积测量两个 query 之间的相似性,更大的值更相似。因此,与关键实例更相似的实例会有更好的注意力权重。information 向量 v i v_i vi 的附加层允许从每个实例中提取贡献信息,soft-max 操作确保注意力权重之和为1。
因为关键实例不依赖于实例的顺序,U 也是对称的,即,包嵌入 b 不依赖于实例的顺序。因此, 第二流是置换不变的且满足定义。
最后的 包 分数是两个流的平均:
c
(
B
)
=
1
2
(
g
m
(
f
(
x
i
)
,
⋯
,
f
(
x
n
)
)
+
g
b
(
f
(
x
i
)
,
⋯
,
f
(
x
n
)
)
)
=
1
2
(
W
0
h
m
+
W
b
∑
i
U
(
h
i
,
H
m
)
v
i
)
包嵌入的结果是一个矩阵 $ b \in \Re^{L \times C}$ , C 是类的数目,每个 information 向量
v
i
v_i
vi 的加权和。最后的全连接层输出 C 通道。 information 向量
v
i
v_i
vi 允许实例间特征选择(距离度量被用来实例间特征选择, 根据实例间相似性)。结果的包嵌入是一个与包大小无关的固定的形状,将被用来计算包得分
c
b
c_b
cb。如图3。
SimCLR
结论: MIL聚合器、自监督对比学习、多尺度特征
展望
@inproceedings{Li:2021:1431814328,
author = {Bin Li and Yin Li and Kevin W Eliceiri},
title = {Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning},
booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition},
pages = {14318--14328},
year = {2021}
url = {https://arxiv.org/abs/2011.08939}
}
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。