当前位置:   article > 正文

K-means算法在西瓜数据集4.0上实验(Python实现)_在西瓜数据集上应用k均值聚类算法对西瓜进行聚类。

在西瓜数据集上应用k均值聚类算法对西瓜进行聚类。
  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. import matplotlib.pyplot as plt
  4. import matplotlib.animation as animation
  5. def kmeans(data, center_ids, max_err=0.0001, max_round=30):
  6. init_centers = []
  7. n = len(center_ids)
  8. for id in center_ids:
  9. init_centers.append(data[id, :])
  10. error, rounds = 1.0, 0
  11. while error > max_err and rounds < max_round:
  12. rounds += 1
  13. clusters = []
  14. for _ in range(n):
  15. clusters.append([])
  16. for j in range(len(data)):
  17. dist = []
  18. for i in range(n):
  19. vector = data[j, :] - init_centers[i]
  20. d_ji = np.dot(vector, vector) ** 0.5
  21. dist.append(d_ji)
  22. near_id = sorted(enumerate(dist), key=lambda x: x[1])[0][0]
  23. clusters[near_id].append(j)
  24. new_center = [0] * n
  25. error = 0
  26. for i in range(n):
  27. new_center[i] = np.sum(data[clusters[i], :], axis=0)
  28. new_center[i] /= len(clusters[i])
  29. vec = new_center[i] - init_centers[i]
  30. err = np.dot(vec, vec) ** 0.5
  31. if err:
  32. init_centers[i] = new_center[i]
  33. error += err
  34. yield clusters, new_center, rounds # 用yield可以得到每一轮训练后的聚类情况,最终返回的是一个生成器
  35. data = np.array([
  36. [0.697, 0.460], [0.774, 0.376], [0.634, 0.264], [0.608, 0.318], [0.556, 0.215],
  37. [0.403, 0.237], [0.481, 0.149], [0.437, 0.211], [0.666, 0.091], [0.243, 0.267],
  38. [0.245, 0.057], [0.343, 0.099], [0.639, 0.161], [0.657, 0.198], [0.360, 0.370],
  39. [0.593, 0.042], [0.719, 0.103], [0.359, 0.188], [0.339, 0.241], [0.282, 0.257],
  40. [0.748, 0.232], [0.714, 0.346], [0.483, 0.312], [0.478, 0.437], [0.525, 0.369],
  41. [0.751, 0.489], [0.532, 0.472], [0.473, 0.376], [0.725, 0.445], [0.446, 0.459]])
  42. init_centers = [12, 22] # 对应的是选择的初始中心样本的id,这也同时代表了选择的聚类数目
  43. fig, ax = plt.subplots(1, 1, figsize=(5, 5))
  44. ax.set_xlim(0, 1)
  45. ax.set_ylim(0, 0.6)
  46. ax.set_ylabel('sugar')
  47. ax.set_xlabel('density')
  48. imgs = []
  49. for cluster, center, rounds in kmeans(data, init_centers): # 对各轮聚类的结果进行保存,存入imgs
  50. pics, dye = [], ['red', 'orange', 'green', 'blue', 'pink']
  51. ax.set_title('clusters in %s rounds' % rounds)
  52. for i, li in enumerate(cluster):
  53. pics.append(ax.scatter(data[li, 0], data[li, 1], c=dye[i]))
  54. pics.append(ax.scatter(center[i][0], center[i][1], s=45, c='gray', marker='s', ))
  55. imgs.append(pics)
  56. imgs.insert(0, [ax.scatter(data[:, 0], data[:, 1], c='k')])
  57. A = animation.ArtistAnimation(fig, imgs, interval=1000, blit=True, repeat_delay=500)
  58. plt.show()
  59. A.save('3point.gif', fps=2, writer='imagemagick') # 设置保存路径,gif图每秒帧数

K-means算法的2类聚类:

 

K-means算法的3类聚类:

K-means算法的4类聚类:

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/78445
推荐阅读
相关标签
  

闽ICP备14008679号