当前位置:   article > 正文

k-prototypes算法python实现,参数详解_k-prototype聚类 python

k-prototype聚类 python

k-prototypes算法是用于处理混合类型数据的经典聚类算法,为了方便研究者利用python进行混合聚类的数据分析,特将python中kmodes包重要参数与使用方法转载如下:

以下内容搬运自创作者的GITHUB:
https://github.com/nicodv/kmodes/blob/master/kmodes/kprototypes.py

kmodes包提供了kprotypes算法的python 实现,使用方式与sklearn中kmeans算法类似。

训练样例:

kp = KPrototypes(n_clusters=i, max_iter=80, n_init=8, n_jobs=5, verbose=2).fit(x_train2, categorical=[3,4,5,7,8,9])
  • 1

具体的参数如下(parameters对应样例第一个括号内参数):

Parameters
-----------
n_clusters : int, optional, default: 8
要形成的类的数量以及要产生的质心的数量。
max_iter : int, default: 100
k-modes算法单次运行的最大迭代次数。
num_dissim : func, default: euclidian_dissim
数值变量算法所采用的相似度函数。
默认为欧几里得距离函数。
cat_dissim : func, default: matching_dissim
分类变量的kmodes算法使用的相似度函数。(以下内容请自行翻译)
Defaults to the matching dissimilarity function.
n_init : int, default: 10
Number of time the k-modes algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of cost.
init : {‘Huang’, ‘Cao’, ‘random’ or a list of ndarrays}, default: ‘Cao’
Method for initialization:
‘Huang’: Method in Huang [1997, 1998]
‘Cao’: Method in Cao et al. [2009]
‘random’: choose ‘n_clusters’ observations (rows) at random from
data for the initial centroids.
If a list of ndarrays is passed, it should be of length 2, with
shapes (n_clusters, n_features) for numerical and categorical
data respectively. These are the initial centroids.
gamma : float, default: None
Weighing factor that determines relative importance of numerical vs.
categorical attributes (see discussion in Huang [1997]). By default,
automatically calculated from data.
verbose : integer, optional
Verbosity mode.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.
n_jobs : int, default: 1
The number of jobs to use for the computation. This works by computing
each of the n_init runs in parallel.
If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.

训练过程中fit后括号内参数如下:
Parameters
----------
X : array-like, shape=[n_samples, n_features]
categorical : Index of columns that contain categorical data

训练结果的展示代码样例:

label = kp.labels_
  • 1

其他可选的展示参数如下:
Attributes
----------
cluster_centroids_ : array, [n_clusters, n_features]
Categories of cluster centroids
labels_ :
Labels of each point
cost_ : float
Clustering cost, defined as the sum distance of all points to
their respective cluster centroids.
n_iter_ : int
The number of iterations the algorithm ran for.
epoch_costs_ :
The cost of the algorithm at each epoch from start to completion.
gamma : float
The (potentially calculated) weighing factor.
Notes
-----
See:
Huang, Z.: Extensions to the k-modes algorithm for clustering large
data sets with categorical values, Data Mining and Knowledge
Discovery 2(3), 1998.

原作者还提供了官方的样例如下:

#!/usr/bin/env python

import timeit

import numpy as np

from kmodes.kprototypes import KPrototypes

# number of clusters
K = 20
# no. of points
N = int(1e5)
# no. of dimensions
M = 10
# no. of numerical dimensions
MN = 5
# no. of times test is repeated
T = 3

data = np.random.randint(1, 1000, (N, M))


def huang():
    KPrototypes(n_clusters=K, init='Huang', n_init=1, verbose=2)\
        .fit_predict(data, categorical=list(range(M - MN, M)))


def cao():
    KPrototypes(n_clusters=K, init='Cao', verbose=2)\
        .fit_predict(data, categorical=list(range(M - MN, M)))


if __name__ == '__main__':

    for cm in ('huang', 'cao'):
        print(cm.capitalize() + ': {:.2} seconds'.format(
            timeit.timeit(cm + '()',
                          setup='from __main__ import ' + cm,
                          number=T)))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/秋刀鱼在做梦/article/detail/803127
推荐阅读
相关标签
  

闽ICP备14008679号