赞
踩
根据上面的流程简要做个概述:
import numpy as np def cal_l2_distance_two_loops(test_X, train_X): """ 计算L2距离,两层循环 :return: """ num_test = test_X.shape[0] num_train = train_X.shape[0] dists = np.zeros((num_test, num_train)) for i in range(num_test): for j in range(num_train): test_line = test_X[i] train_line = train_X[j] temp = np.subtract(test_line, train_line) temp = np.power(temp, 2) dists[i][j] = np.sqrt(temp.sum()) return dists
import numpy as np
def cal_l2_distances_one_loop(test_X, train_X):
"""
计算L2距离,一层循环
:return:
"""
num_test = test_X.shape[0]
num_train = train_X.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
dists[i] = np.sqrt(np.sum(np.square(train_X - test_X[i]), axis=1)).T
return dists
运算效率最高的算法是将训练集和测试集都使用矩阵表示,然后使用矩阵运算的方法替代之前的循环操作。但此操作需要我们对矩阵的运算规则非常熟悉。接下来着重记录如何计算两个矩阵之间的欧式距离。
记录测试集矩阵P的大小为MD,训练集矩阵C的大小为ND(测试集中共有M个点,每个点为D维特征向量。训练集中共有N个点,每个点为D维特征向量)
记
P
i
P_i
Pi是P的第i行,记
P
i
P_i
Pi是C的第j行
P
i
=
[
P
i
1
P
i
2
⋯
P
i
D
]
P_i = [ P_{i1}\quad P_{i2} \cdots P_{iD}]
Pi=[Pi1Pi2⋯PiD]
C
j
=
[
C
j
1
C
j
2
⋯
C
j
D
]
C_j = [ C_{j1} C_{j2} \cdots \quad C_{jD}]
Cj=[Cj1Cj2⋯CjD]
首先计算
P
i
P_i
Pi和
C
j
C_j
Cj之间的距离dist(i,j)
d
(
P
i
,
C
j
)
=
(
P
i
1
−
C
j
1
)
2
+
(
P
i
2
−
C
j
2
)
2
+
⋯
+
(
P
i
D
−
C
j
D
)
2
=
(
P
i
1
2
+
P
i
2
2
+
⋯
+
P
i
D
2
)
+
(
C
j
1
2
+
C
j
2
2
+
⋯
+
C
j
D
2
)
−
2
×
(
P
i
1
C
j
1
+
P
i
2
C
j
2
+
⋯
+
P
i
D
C
i
D
)
=
∥
P
i
∥
2
+
∥
C
j
∥
2
−
2
×
P
i
C
j
T
d(P_i,C_j) = \sqrt{(P_{i1}-C_{j1})^2+(P_{i2}-C_{j2})^2+\cdots+(P_{iD}-C_{jD})^2}\\ =\sqrt{(P_{i1}^2+P_{i2}^2+\cdots+P_{iD}^2)+(C_{j1}^2+C_{j2}^2+\cdots+C_{jD}^2)-2\times(P_{i1}C_{j1}+P_{i2}C_{j2}+\cdots+P_{iD}C_{iD})}\\=\sqrt{\left \| P_i\right \|^2+\left\|C_j\right\|^2-2\times P_iC_j^T}
d(Pi,Cj)=(Pi1−Cj1)2+(Pi2−Cj2)2+⋯+(PiD−CjD)2
=(Pi12+Pi22+⋯+PiD2)+(Cj12+Cj22+⋯+CjD2)−2×(Pi1Cj1+Pi2Cj2+⋯+PiDCiD)
=∥Pi∥2+∥Cj∥2−2×PiCjT
我们可以推广到距离矩阵的第i行的计算公式
d
i
s
t
[
i
]
=
(
∥
P
i
∥
2
∥
P
i
∥
2
⋯
∥
P
i
∥
2
)
+
(
∥
C
1
∥
2
∥
C
2
∥
2
⋯
∥
C
N
∥
2
)
−
2
×
P
i
(
C
1
T
C
2
T
⋯
C
N
T
)
=
(
∥
P
i
∥
2
∥
P
i
∥
2
⋯
∥
P
i
∥
2
)
+
(
∥
C
1
∥
2
∥
C
2
∥
2
⋯
∥
C
N
∥
2
)
−
2
×
P
i
C
T
dist[i]=\sqrt{(\left\|P_i\right\|^2\quad \left\|P_i\right\|^2 \cdots \left\|P_i\right\|^2)+(\left\|C_1\right\|^2 \quad \left\|C_2\right\|^2 \cdots \left\|C_N\right\|^2)-2\times P_i(C_1^T \quad C_2^T \cdots C_N^T)}\\=\sqrt{(\left\|P_i\right\|^2\quad \left\|P_i\right\|^2 \cdots \left\|P_i\right\|^2)+(\left\|C_1\right\|^2 \quad \left\|C_2\right\|^2 \cdots \left\|C_N\right\|^2)-2\times P_iC^T}
dist[i]=(∥Pi∥2∥Pi∥2⋯∥Pi∥2)+(∥C1∥2∥C2∥2⋯∥CN∥2)−2×Pi(C1TC2T⋯CNT)
=(∥Pi∥2∥Pi∥2⋯∥Pi∥2)+(∥C1∥2∥C2∥2⋯∥CN∥2)−2×PiCT
注意:这里公式中的第二项为什么是
∥
C
j
∥
2
\left\|C_j\right\|^2
∥Cj∥2呢,因为算的是i行的距离。即测试样本的第i行到训练样本中每一个点的距离。如果对这个不是很清楚的同学,可以看看下面参看文章中的第三个链接。
继续将公式推广为整个距离矩阵
d
i
s
t
=
(
∥
P
1
∥
2
∥
P
1
∥
2
⋯
∥
P
1
∥
2
∥
P
2
∥
2
∥
P
2
∥
2
⋯
∥
P
2
∥
2
⋮
⋮
⋱
⋮
∥
P
M
∥
2
∥
P
M
∥
2
⋯
∥
P
M
∥
2
)
+
(
∥
C
1
∥
2
∥
C
2
∥
2
⋯
∥
C
N
∥
2
∥
C
1
∥
2
∥
C
2
∥
2
⋯
∥
C
N
∥
2
⋮
⋮
⋱
⋮
∥
C
1
∥
2
∥
C
2
∥
2
⋯
∥
C
N
∥
2
)
−
2
×
P
C
T
dist=\sqrt{
通过矩阵计算两个之前的欧式距离的代码,参看博客中的代码好像都有问题,下面是我根据下面的公式,自己写的一个。
import numpy as np
def cal_l2_distances_no_loops(test_X, train_X):
"""
计算L2距离,通过矩阵运算
:return:
"""
first = np.sum(np.square(test_X), axis=1)
second = np.sum(np.square(train_X), axis=1).T
# 注意这里的np.dot(test_X, train_X.T)中的test_X, train_X位置是和公式中的顺序保持一致的
three = -2 * np.dot(test_X, train_X.T)
dists = np.sqrt(first + second + three)
return dists
import numpy as np class MatrixDistance(object): """计算两个矩阵之的L2距离(欧式距离)""" def __init__(self, train_X, test_X): self.train_X = train_X self.test_X = test_X def cal_l2_distance_two_loops(self): """ 计算L2距离,两层循环 :return: """ num_test = self.test_X.shape[0] num_train = self.train_X.shape[0] dists = np.zeros((num_test, num_train)) for i in range(num_test): for j in range(num_train): test_line = self.test_X[i] train_line = self.train_X[j] temp = np.subtract(test_line, train_line) temp = np.power(temp, 2) dists[i][j] = np.sqrt(temp.sum()) return dists def cal_l2_distances_one_loop(self): """ 计算L2距离,一层循环 :return: """ num_test = self.test_X.shape[0] num_train = self.train_X.shape[0] dists = np.zeros((num_test, num_train)) for i in range(num_test): dists[i] = np.sqrt(np.sum(np.square(self.train_X - self.test_X[i]), axis=1)).T return dists def cal_l2_distances_no_loops(self): """ 计算L2距离,通过矩阵运算 :return: """ first = np.sum(np.square(self.test_X), axis=1) second = np.sum(np.square(self.train_X), axis=1).T # 注意这里的np.dot(self.test_X, self.train_X.T)中的test_X, train_X位置是和前面的顺序保持一致的 three = -2 * np.dot(self.test_X, self.train_X.T) dists = np.sqrt(first + second + three) return dists if __name__ == '__main__': train_x = np.matrix(np.arange(12).reshape(3, 4)) test_x = np.matrix(np.arange(2, 14).reshape(3, 4)) d = MatrixDistance(train_x, test_x) print(d.cal_l2_distance_two_loops()) print(d.cal_l2_distances_one_loop()) print(d.cal_l2_distances_no_loops())
参考文章:
计算两个矩阵之间的欧式距离
k近邻算法
K-近邻算法介绍与代码实现
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。