赞
踩
汇总一些用到的距离函数或者核函数函数。距离函数中,Hausdorff距离为把包视作点集,另外的距离函数考虑了包的概率分布。
本文的符号表示如下:
符号 | 意义 |
---|---|
A A A | 包 |
∣ A ∣ \vert A \vert ∣A∣ | 包大小 |
a \boldsymbol{a} a | 包 A A A中实例 |
d d d | 包之间的距离 |
s s s | 包之间的相似度 |
k k k | 包之间的核 |
首先需要弄清楚相似度函数 s s s与距离函数 d d d的区别:简单来说,两个包越相近,距离越近,相似度越高。根据 1,两者可以有以下转换关系:
基于相似度函数的函数称为核函数 k k k,核函数与距离函数的可以有以下的转换关系 :
d ( A , B ) = k ( A , A ) − 2 k ( A , B ) + k ( B , B ) . d (A, B) = \sqrt{k (A, A) - 2 k (A, B) + k (B, B)}. d(A,B)=k(A,A)−2k(A,B)+k(B,B) .
公式如下:
d
m
a
x
(
A
,
B
)
=
max
{
h
(
A
,
B
)
,
h
(
B
,
A
)
}
.
d^{\rm max} (A, B) = \max \{ h (A, B), h (B, A) \}.
dmax(A,B)=max{h(A,B),h(B,A)}.其中
h
(
A
,
B
)
=
max
a
∈
A
min
b
∈
B
d
(
a
,
b
)
h (A, B) = \max_{\boldsymbol{a} \in A} \min _{\boldsymbol{b} \in B} d (\boldsymbol{a}, \boldsymbol{b})
h(A,B)=a∈Amaxb∈Bmind(a,b)该距离易受边缘点的影响。
公式如下:
d min ( A , B ) = min a ∈ A , b ∈ B d ( a , b ) . d^{\min} (A, B) = \min_{\boldsymbol{a} \in A, \boldsymbol{b} \in B} d (\boldsymbol{a}, \boldsymbol{b}). dmin(A,B)=a∈A,b∈Bmind(a,b).
公式如下:
d a v e ( A , B ) = ∑ a ∈ A min b ∈ B d ( a , b ) + ∑ b ∈ B min a ∈ A d ( b , a ) ∣ A ∣ + ∣ B ∣ d^{\rm ave} (A, B) = \frac{\sum \limits_{\boldsymbol{a} \in A} \min \limits_{\boldsymbol{b} \in B} d (\boldsymbol{a}, \boldsymbol{b}) + \sum \limits_{\boldsymbol{b} \in B} \min \limits_{\boldsymbol{a} \in A} d (\boldsymbol{b}, \boldsymbol{a})}{\vert A \vert + \vert B \vert} dave(A,B)=∣A∣+∣B∣a∈A∑b∈Bmind(a,b)+b∈B∑a∈Amind(b,a)
Zafra等认为,正包与正包、正包与负包、负包与负包之间的距离计算应该是不一样的,因此:
d
a
d
a
(
A
,
B
)
=
{
d
m
i
n
(
A
,
B
)
,
正
包
与
正
包
;
d
m
a
x
(
A
,
B
)
,
正
包
与
负
包
;
d
a
v
e
(
A
,
B
)
,
负
包
与
负
包
;
d^{\rm ada} (A, B) =
k k k阶Hausdorff距离为解决边缘点对最大Hausdorff距离距离的影响,其将 h ( B i , B j ) h (B_i, B_j) h(Bi,Bj)修改为:
h ( A , B ) = k a ∈ A t h min b ∈ B d ( a , b ) , h (A, B) = k_{\boldsymbol{a} \in A}^{th} \min_{\boldsymbol{b} \in B} d (\boldsymbol{a}, \boldsymbol{b}), h(A,B)=ka∈Athb∈Bmind(a,b),其中 k t h k^{th} kth表示第 k k k个最大值,自然 k k k取最大值时,该距离退化为最大Hausdorff距离。
OWA (Ordered weighted averaging operators)能够替换Hausdorff距离中的 max \max max、 min \min min操作,以提高其性能。
给定长度为 m m m的序列 V V V,OWA首先对其降序排列得到 C C C,再对排序后的序列分配权重 w i ∈ [ 0 , 1 ] w_i \in [0, 1] wi∈[0,1],且 ∑ i = 1 m w i = 1 \sum_{i = 1}^m w_i = 1 ∑i=1mwi=1。加劝平均值的计算如下:
O W A W ( V ) = ∑ i = 1 m w i c i , {\rm OWA}_W (V) = \sum_{i = 1}^m w_i c_i, OWAW(V)=i=1∑mwici,其中 W = ⟨ w 1 , ⋯ , w m ⟩ W = \langle w_1, \cdots, w_m \rangle W=⟨w1,⋯,wm⟩代表权重向量, C = ⟨ c 1 , ⋯ , c m ⟩ C = \langle c_1, \cdots, c_m \rangle C=⟨c1,⋯,cm⟩表示降序排列值。
OWA运算可以看作是对一组值进行任意的聚合操作的概况。例如求向量 V V V中的最大值,只需分配权重 W = ⟨ 1 , 0 , ⋯ , 0 ⟩ W = \langle 1, 0, \cdots, 0 \rangle W=⟨1,0,⋯,0⟩即可。
需要将 h ( A , B ) h (A, B) h(A,B)替换为:
h O W A ( A , B ) = O W A W max O W A W min ∥ a − b ∥ ⏟ b ∈ B ⏟ a ∈ A . h^{\rm OWA} (A, B) = {{\rm OWA}_{W_{\max}}} \underbrace{{{\rm OWA}_{W_{\min}}} \underbrace{\| a - b \|}_{b \in B}}_{a \in A}. hOWA(A,B)=OWAWmaxa∈A OWAWminb∈B ∥a−b∥.
d OWA a v e ( A , B ) = ∑ a ∈ A O W A W min ∥ a − b ∥ ⏟ b ∈ B + ∑ b ∈ B O W A W min ∥ a − b ∥ ⏟ a ∈ A ∣ A ∣ + ∣ B ∣ . d_{\text {\rm OWA}}^{\rm ave}(A, B)=\frac{\sum \limits_{a \in A} \mathrm{OWA}_{W\min} \underbrace{\|a-b\|}_{b \in B}+\sum \limits_{b \in B} \mathrm{OWA}_{W\min} \underbrace{\|a-b\|}_{a \in A}}{|A|+|B|}. dOWAave(A,B)=∣A∣+∣B∣a∈A∑OWAWminb∈B ∥a−b∥+b∈B∑OWAWmina∈A ∥a−b∥.
权重的设置可以参见7,有以下两个版本:
W max − L = ⟨ 2 m + 1 , 2 ( m − 1 ) m ( m + 1 ) , ⋯ , 4 m ( m + 1 ) , 2 m ( m + 1 ) ⟩ . W_{\max -L}=\left\langle\frac{2}{m+1}, \frac{2(m-1)}{m(m+1)}, \cdots, \frac{4}{m(m+1)}, \frac{2}{m(m+1)}\right\rangle. Wmax−L=⟨m+12,m(m+1)2(m−1),⋯,m(m+1)4,m(m+1)2⟩.
W max − I A = ⟨ 1 1 ∑ i = 1 m 1 i , 1 2 ∑ i = 1 m 1 i , ⋯ , 1 ( m − 1 ) ∑ i = 1 m 1 i , 1 m ∑ i = 1 m 1 i ⟩ . W_{\max -I A}=\left\langle\frac{1}{1 \sum \limits_{i=1}^{m} \frac{1}{i}}, \frac{1}{2 \sum \limits_{i=1}^{m} \frac{1}{i}}, \cdots, \frac{1}{(m-1) \sum \limits_{i=1}^{m} \frac{1}{i}}, \frac{1}{m \sum \limits_{i=1}^{m} \frac{1}{i}}\right\rangle. Wmax−IA=⟨1i=1∑mi11,2i=1∑mi11,⋯,(m−1)i=1∑mi11,mi=1∑mi11⟩.对于 W min W_{\min} Wmin,只需将上述权重向量反向。
d E M D ( A , B ) = ∑ a ∈ A , b ∈ B f ( a , b ) d ( a , b ) , d^{\rm EMD} (A, B) = \sum_{a \in A, b \in B} f (a, b) d (a, b), dEMD(A,B)=a∈A,b∈B∑f(a,b)d(a,b),其中 d ( a , b ) d (a, b) d(a,b)是欧式距离, f ( a , b ) f (a, b) f(a,b)是使得总距离最小的flow,满足以下条件:
min
D
E
M
D
(
A
,
B
)
s
.
t
.
{
f
(
a
,
b
)
≥
0
,
∑
a
∈
A
f
(
a
,
b
)
≤
1
n
b
,
∑
b
∈
B
f
(
a
,
b
)
≤
1
n
a
,
∑
a
∈
A
∑
b
∈
B
f
(
a
,
b
)
=
1.
用于MIL的马氏距离定义如下为:每个包的分布用具有均值 μ \mu μ和协方差矩阵 ∑ \sum ∑的高斯分布来描述:
d M D ( A , B ) = ( μ a − μ b ) T ( 1 2 ∑ a + 1 2 ∑ b ) − 1 ( μ a − μ b ) d^{\rm MD}(A, B)=\left(\mu_{a}-\mu_{b}\right)^{\rm T}\left(\frac{1}{2} \sum_{a}+\frac{1}{2} \sum_{b}\right)^{-1}\left(\mu_{a}-\mu_{b}\right) dMD(A,B)=(μa−μb)T(21a∑+21b∑)−1(μa−μb)该距离的局限在于当包中实例较少但维度较高时,对协方差矩阵求你是很困难甚至是不可能的。
该距离试图解决高斯分布可能过于restrictive的问题:
d C S D ( A , B ) = − log ( K σ a + σ b ( A , B ) ( K 2 σ a ( A , B ) K 2 σ b ( A , B ) ) 1 / 2 ) , d^{\rm CSD}(A, B)=-\log \left(\frac{K_{\sigma_{a}+\sigma_{b}}(A, B)}{\left(K_{2 \sigma_{a}}(A, B) K_{2 \sigma_{b}}(A, B)\right)^{1 / 2}}\right), dCSD(A,B)=−log((K2σa(A,B)K2σb(A,B))1/2Kσa+σb(A,B)),其中
K σ ( A , B ) = ∑ a ∈ A , b ∈ B exp ( 1 − 2 σ 2 ( a − b ) T ( a − b ) ) ( 2 π σ 2 ) d / 2 . K_{\sigma}(A, B)=\sum_{a \in A, b \in B} \frac{\exp \left(\frac{1}{-2 \sigma^{2}}(a-b)^{\rm T}(a-b)\right)}{\left(2 \pi \sigma^{2}\right)^{d / 2}}. Kσ(A,B)=a∈A,b∈B∑(2πσ2)d/2exp(−2σ21(a−b)T(a−b)).当 A A A、 B B B来自同一数据集时,有 σ a = σ b \sigma_{a} = \sigma_{b} σa=σb 9 10。
详情参见:https://blog.csdn.net/weixin_44575152/article/details/106253273。
详情参见:https://blog.csdn.net/weixin_44575152/article/details/105624719。
E. Deza, M.-M. Deza, Dictionary of distances, Elsevier, 2006. doi:https://doi.org/10.1016/B978-0-444-52087-6.X5000-8. ↩︎
G. Edgar, Measure, topology, and fractal geometry, 3rd print. Springer-Verlag, Berlin, 1995. ↩︎ ↩︎
J. Wang and J.-D. Zucker, “Solving multiple-instance problem: a lazy learning approach,” pp. 1119–1125, 2000. [Online]. Available: http://cogprints.org/2124/ ↩︎
M.-L. Zhang and Z.-H. Zhou, “Multi-instance clustering with applications to multi-instance prediction,” Applied Intelligence, vol. 31, no. 1, pp. 47–68, 2009. ↩︎
A. Zafra, M. Pechenizkiy, S. Ventura, ReliefF-MI: An extension of ReliefF to multiple instance learning, Neuro computing75(1)(2012)210–218. doi:https://doi.org/10.1016/j.neucom.2011.03.052.
URL http://www.sciencedirect.com/science/article/pii/S0925231211003997 ↩︎
R. R. Yager, J. Kacprzyk, The ordered weighted averaging operators: Theory and applications, Springer Science & Business Media, 2012. ↩︎
S. Vluymans, D. S. Tarragó, Y. Saeys, C. Cornelis, F. Herrera, Fuzzy multi-instance classifiers, IEEE Transactions on Fuzzy Systems 24 (6) (2016) 1395–1409. ↩︎
Y. Rubner, C. Tomasi, L. J. Guibas, The earth mover’s distance as a metric for image retrieval, International journal of computer vision 40 (2) (2000) 99–121. ↩︎
V. Cheplygina, D. M. J. Tax, M. Loog, Multiple instance learning with bag dissimilarities, Pattern Recogn 48 (1) (2015) 264–275. ↩︎
M. Budka, B. Gabrys, K. Musial, On accuracy of pdf divergence estimators and their applicability to representative data sampling, Entropy 13 (7) (2011) 1229–1266. ↩︎
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。