当前位置:   article > 正文

模糊 C 均值聚类(Fuzzy C-Means)_fuzzy partition coefficient

fuzzy partition coefficient

原理

参考:https://www.cnblogs.com/xiaohuahua108/p/6187178.html?utm_source=itdadao&utm_medium=referral

上面有些小错误,还是可以参考
在这里插入图片描述

安装 skfuzzy

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scikit-fuzzy
  • 1

通过 help 函数了解参数

函数参数

cmeans(data, c, m, error, maxiter, metric='euclidean', init=None, seed=None)
    Fuzzy c-means clustering algorithm [1].
    
    Parameters
    ----------
    data : 2d array, size (S, N)
        Data to be clustered.  N is the number of data sets; S is the number
        of features within each sample vector.
    c : int
        Desired number of clusters or classes.
    m : float
        Array exponentiation applied to the membership function u_old at each
        iteration, where U_new = u_old ** m.
    error : float
        Stopping criterion; stop early if the norm of (u[p] - u[p-1]) < error.
    maxiter : int
        Maximum number of iterations allowed.
    metric: string
        By default is set to euclidean. Passes any option accepted by
        ``scipy.spatial.distance.cdist``.
    init : 2d array, size (S, N)
        Initial fuzzy c-partitioned matrix. If none provided, algorithm is
        randomly initialized.
    seed : int
        If provided, sets random seed of init. No effect if init is
        provided. Mainly for debug/testing purposes. 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

返回值

   Returns
   -------
   cntr : 2d array, size (S, c)
       Cluster centers.  Data for each center along each feature provided
       for every cluster (of the `c` requested clusters).
   u : 2d array, (S, N)
       Final fuzzy c-partitioned matrix.
   u0 : 2d array, (S, N)
       Initial guess at fuzzy c-partitioned matrix (either provided init or
       random guess used if init was not provided).
   d : 2d array, (S, N)
       Final Euclidian distance matrix.
   jm : 1d array, length P
       Objective function history.
   p : int
       Number of iterations run.
   fpc : float
       Final fuzzy partition coefficient. 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

注意事项

Notes
 -----
 The algorithm implemented is from Ross et al. [1]_.
 
 Fuzzy C-Means has a known problem with high dimensionality datasets, where
 the majority of cluster centers are pulled into the overall center of
 gravity. If you are clustering data with very high dimensionality and
 encounter this issue, another clustering method may be required. For more
 information and the theory behind this, see Winkler et al. [2]_.
 
 References
 ----------
 .. [1] Ross, Timothy J. Fuzzy Logic With Engineering Applications, 3rd ed.
        Wiley. 2010. ISBN 978-0-470-74376-8 pp 352-353, eq 10.28 - 10.35.
 
 .. [2] Winkler, R., Klawonn, F., & Kruse, R. Fuzzy c-means in high
        dimensional spaces. 2012. Contemporary Theory and Pragmatic
        Approaches in Fuzzy Computing Utilization, 1. ```
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

实战

数据生成

import numpy as np
import matplotlib.pyplot as plt


def gen_clusters():
    mean1 = [0,0]
    cov1 = [[1,0],[0,10]]
    data = np.random.multivariate_normal(mean1,cov1,100)
    
    mean2 = [10,10]
    cov2 = [[10,0],[0,1]]
    data = np.append(data,
                     np.random.multivariate_normal(mean2,cov2,100),
                     0)
    
    mean3 = [10,0]
    cov3 = [[3,0],[0,4]]
    data = np.append(data,
                     np.random.multivariate_normal(mean3,cov3,100),
                     0)
    
    return np.round(data,4)


def show_scatter(data):
    x,y = data.T
    plt.scatter(x,y)
    plt.axis()
    plt.title("scatter")
    plt.xlabel("x")
    plt.ylabel("y")
    
data = gen_clusters()
show_scatter(data)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

在这里插入图片描述

FCM 聚类

center, u, u0, d, jm, p, fpc = cmeans(data.T, 3, m=2, error=1e-6, maxiter=20)
  • 1

根据隶属度分配标签

labels = np.argmax(u.T, axis=1)
plt.scatter(*data.T, c=labels, alpha = 0.5)
plt.plot(*center.T, 'ro')
  • 1
  • 2
  • 3

在这里插入图片描述

结果分析

代价函数下降过程

plt.plot(jm)
  • 1

在这里插入图片描述

隶属度与聚类中心的关系

聚类中心为原数据基于隶属度的 m m m次方的加权平均:
c i = ∑ j = 1 N u i j m x j ∑ j = 1 N u i j m \mathbf{c}_i = \frac{\sum_{j=1}^N u_{ij}^m \mathbf{x}_j }{ \sum_{j=1}^N u_{ij}^m} ci=j=1Nuijmj=1Nuijmxj

>>> u_m = np.power(u,2)
>>> u_m @ data/np.sum(u_m,axis=1)[:,None]

array([[ 9.50909001,  9.93190823],
       [ 0.07773254,  0.0473052 ],
       [ 9.9406815 , -0.38315575]])

>>> center

array([[ 9.50909674,  9.93190816],
       [ 0.07773248,  0.04731199],
       [ 9.94067952, -0.38315793]])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

求聚类中心的高斯核半径

即所有点到各个聚类中心的加权平均距离
d i = ∑ j = 1 N u i j m ∣ ∣ x j − c i ∣ ∣ 2 ∑ j = 1 N u i j m \mathbf{d}_i = \frac{\sum_{j=1}^N u_{ij}^m ||\mathbf{x}_j -\mathbf{c}_i||_2}{\sum_{j=1}^N u_{ij}^m} di=j=1Nuijmj=1Nuijmxjci2

D = np.sqrt(np.sum(np.square(data[:,None] - center), axis=2))
radius = [np.sum(u_m[i] @ D[:,i]) / np.sum(u_m[i]) for i in range(3)]
  • 1
  • 2

画图

from matplotlib.patches import Circle
import matplotlib.colors as colors

cdict = ['#5fbb44', '#f5f329', '#e50b32']
cmap = colors.ListedColormap(cdict, 'indexed')

circles = [Circle(xy = center[i], radius=radius[i], facecolor= cdict[i], alpha=0.5) for i in range(len(center))]

fig, ax = plt.subplots()

[ax.add_patch(cir) for cir in circles]

plt.scatter(data[:,0], data[:,1], c=labels, cmap=cmap,lw=1, s=10)
plt.plot(*center.T, 'ko')
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

在这里插入图片描述

为什么要用隶属度的m次方加权

来看看对某个中心点的隶属度的分布:

plt.figure(figsize=(10,10))
_ = plt.hist(u[0], bins=50, label='$u$')
_ = plt.hist(u_m[0], bins=50, label='$u^2$')
plt.legend()
  • 1
  • 2
  • 3
  • 4

从图中可以直观地看到, u m u^m um 将原本很小的 u u u 值压到 0 附近,即如果本来隶属度就小,那就干脆把它变成零,不属于这个类!

在这里插入图片描述

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/588422
推荐阅读
相关标签