赞
踩
参考:https://www.cnblogs.com/xiaohuahua108/p/6187178.html?utm_source=itdadao&utm_medium=referral
上面有些小错误,还是可以参考
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scikit-fuzzy
通过 help 函数了解参数
函数参数
cmeans(data, c, m, error, maxiter, metric='euclidean', init=None, seed=None) Fuzzy c-means clustering algorithm [1]. Parameters ---------- data : 2d array, size (S, N) Data to be clustered. N is the number of data sets; S is the number of features within each sample vector. c : int Desired number of clusters or classes. m : float Array exponentiation applied to the membership function u_old at each iteration, where U_new = u_old ** m. error : float Stopping criterion; stop early if the norm of (u[p] - u[p-1]) < error. maxiter : int Maximum number of iterations allowed. metric: string By default is set to euclidean. Passes any option accepted by ``scipy.spatial.distance.cdist``. init : 2d array, size (S, N) Initial fuzzy c-partitioned matrix. If none provided, algorithm is randomly initialized. seed : int If provided, sets random seed of init. No effect if init is provided. Mainly for debug/testing purposes.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
返回值
Returns ------- cntr : 2d array, size (S, c) Cluster centers. Data for each center along each feature provided for every cluster (of the `c` requested clusters). u : 2d array, (S, N) Final fuzzy c-partitioned matrix. u0 : 2d array, (S, N) Initial guess at fuzzy c-partitioned matrix (either provided init or random guess used if init was not provided). d : 2d array, (S, N) Final Euclidian distance matrix. jm : 1d array, length P Objective function history. p : int Number of iterations run. fpc : float Final fuzzy partition coefficient.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
注意事项
Notes ----- The algorithm implemented is from Ross et al. [1]_. Fuzzy C-Means has a known problem with high dimensionality datasets, where the majority of cluster centers are pulled into the overall center of gravity. If you are clustering data with very high dimensionality and encounter this issue, another clustering method may be required. For more information and the theory behind this, see Winkler et al. [2]_. References ---------- .. [1] Ross, Timothy J. Fuzzy Logic With Engineering Applications, 3rd ed. Wiley. 2010. ISBN 978-0-470-74376-8 pp 352-353, eq 10.28 - 10.35. .. [2] Winkler, R., Klawonn, F., & Kruse, R. Fuzzy c-means in high dimensional spaces. 2012. Contemporary Theory and Pragmatic Approaches in Fuzzy Computing Utilization, 1. ```
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
import numpy as np
import matplotlib.pyplot as plt
def gen_clusters():
mean1 = [0,0]
cov1 = [[1,0],[0,10]]
data = np.random.multivariate_normal(mean1,cov1,100)
mean2 = [10,10]
cov2 = [[10,0],[0,1]]
data = np.append(data,
np.random.multivariate_normal(mean2,cov2,100),
0)
mean3 = [10,0]
cov3 = [[3,0],[0,4]]
data = np.append(data,
np.random.multivariate_normal(mean3,cov3,100),
0)
return np.round(data,4)
def show_scatter(data):
x,y = data.T
plt.scatter(x,y)
plt.axis()
plt.title("scatter")
plt.xlabel("x")
plt.ylabel("y")
data = gen_clusters()
show_scatter(data)
center, u, u0, d, jm, p, fpc = cmeans(data.T, 3, m=2, error=1e-6, maxiter=20)
根据隶属度分配标签
labels = np.argmax(u.T, axis=1)
plt.scatter(*data.T, c=labels, alpha = 0.5)
plt.plot(*center.T, 'ro')
plt.plot(jm)
聚类中心为原数据基于隶属度的
m
m
m次方的加权平均:
c
i
=
∑
j
=
1
N
u
i
j
m
x
j
∑
j
=
1
N
u
i
j
m
\mathbf{c}_i = \frac{\sum_{j=1}^N u_{ij}^m \mathbf{x}_j }{ \sum_{j=1}^N u_{ij}^m}
ci=∑j=1Nuijm∑j=1Nuijmxj
>>> u_m = np.power(u,2)
>>> u_m @ data/np.sum(u_m,axis=1)[:,None]
array([[ 9.50909001, 9.93190823],
[ 0.07773254, 0.0473052 ],
[ 9.9406815 , -0.38315575]])
>>> center
array([[ 9.50909674, 9.93190816],
[ 0.07773248, 0.04731199],
[ 9.94067952, -0.38315793]])
即所有点到各个聚类中心的加权平均距离
d
i
=
∑
j
=
1
N
u
i
j
m
∣
∣
x
j
−
c
i
∣
∣
2
∑
j
=
1
N
u
i
j
m
\mathbf{d}_i = \frac{\sum_{j=1}^N u_{ij}^m ||\mathbf{x}_j -\mathbf{c}_i||_2}{\sum_{j=1}^N u_{ij}^m}
di=∑j=1Nuijm∑j=1Nuijm∣∣xj−ci∣∣2
D = np.sqrt(np.sum(np.square(data[:,None] - center), axis=2))
radius = [np.sum(u_m[i] @ D[:,i]) / np.sum(u_m[i]) for i in range(3)]
画图
from matplotlib.patches import Circle
import matplotlib.colors as colors
cdict = ['#5fbb44', '#f5f329', '#e50b32']
cmap = colors.ListedColormap(cdict, 'indexed')
circles = [Circle(xy = center[i], radius=radius[i], facecolor= cdict[i], alpha=0.5) for i in range(len(center))]
fig, ax = plt.subplots()
[ax.add_patch(cir) for cir in circles]
plt.scatter(data[:,0], data[:,1], c=labels, cmap=cmap,lw=1, s=10)
plt.plot(*center.T, 'ko')
plt.show()
来看看对某个中心点的隶属度的分布:
plt.figure(figsize=(10,10))
_ = plt.hist(u[0], bins=50, label='$u$')
_ = plt.hist(u_m[0], bins=50, label='$u^2$')
plt.legend()
从图中可以直观地看到, u m u^m um 将原本很小的 u u u 值压到 0 附近,即如果本来隶属度就小,那就干脆把它变成零,不属于这个类!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。