赞
踩
欢迎关注笔者的微信公众号
在机器学习中有这样一种场景,需要对已知数据按照一定的关系归到不同的类别中(无监督)
k-means是比较流行的聚类方法
其基本算法流程如下:
# Author: Phil Roth <mr.phil.roth@gmail.com> # License: BSD 3 clause import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans from sklearn.datasets import make_blobs plt.figure(figsize=(12, 12)) n_samples = 1500 random_state = 170 X, y = make_blobs(n_samples=n_samples, random_state=random_state) # Incorrect number of clusters y_pred = KMeans(n_clusters=2, random_state=random_state).fit_predict(X) plt.subplot(221) plt.scatter(X[:, 0], X[:, 1], c=y_pred) plt.title("Incorrect Number of Blobs") # Anisotropicly distributed data transformation = [[0.60834549, -0.63667341], [-0.40887718, 0.85253229]] X_aniso = np.dot(X, transformation) y_pred = KMeans(n_clusters=3, random_state=random_state).fit_predict(X_aniso) plt.subplot(222) plt.scatter(X_aniso[:, 0], X_aniso[:, 1], c=y_pred) plt.title("Anisotropicly Distributed Blobs") # Different variance X_varied, y_varied = make_blobs(n_samples=n_samples, cluster_std=[1.0, 2.5
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。