scipy.cluster.hierarchy.linkage(data, method=‘average’, metric=“euclidean”)用于层次聚类即完成上述的3步
metric : str or function, optional
The distance metric to use. The distance function can
be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching',
'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
* method='single' assigns .. math:: d(u,v) = \\min(dist(u[i],v[j])) for all points :math:`i` in cluster :math:`u` and :math:`j` in cluster :math:`v`. This is also known as the Nearest Point Algorithm. * method='complete' assigns .. math:: d(u, v) = \\max(dist(u[i],v[j])) for all points :math:`i` in cluster u and :math:`j` in cluster :math:`v`. This is also known by the Farthest Point Algorithm or Voor Hees Algorithm. * method='average' assigns .. math:: d(u,v) = \\sum_{ij} \\frac{d(u[i], v[j])} {(|u|*|v|)} for all points :math:`i` and :math:`j` where :math:`|u|` and :math:`|v|` are the cardinalities of clusters :math:`u` and :math:`v`, respectively. This is also called the UPGMA algorithm. * method='weighted' assigns .. math:: d(u,v) = (dist(s,v) + dist(t,v))/2 where cluster u was formed with cluster s and t and v is a remaining cluster in the forest (also called WPGMA). * method='centroid' assigns .. math:: dist(s,t) = ||c_s-c_t||_2 where :math:`c_s` and :math:`c_t` are the centroids of clusters :math:`s` and :math:`t`, respectively. When two clusters :math:`s` and :math:`t` are combined into a new cluster :math:`u`, the new centroid is computed over all the original objects in clusters :math:`s` and :math:`t`. The distance then becomes the Euclidean distance between the centroid of :math:`u` and the centroid of a remaining cluster :math:`v` in the forest. This is also known as the UPGMC algorithm. * method='median' assigns :math:`d(s,t)` like the ``centroid`` method. When two clusters :math:`s` and :math:`t` are combined into a new cluster :math:`u`, the average of centroids s and t give the new centroid :math:`u`. This is also known as the WPGMC algorithm. * method='ward' uses the Ward variance minimization algorithm. The new entry :math:`d(u,v)` is computed as follows, .. math:: d(u,v) = \\sqrt{\\frac{|v|+|s|} {T}d(v,s)^2 + \\frac{|v|+|t|} {T}d(v,t)^2 - \\frac{|v|} {T}d(s,t)^2} where :math:`u` is the newly joined cluster consisting of clusters :math:`s` and :math:`t`, :math:`v` is an unused cluster in the forest, :math:`T=|v|+|s|+|t|`, and :math:`|*|` is the cardinality of its argument. This is also known as the incremental algorithm.
scipy.cluster.hierarchy.dendrogram(Z, labels=label, above_threshold_color=‘C0’)主要用于画层次聚类图
from matplotlib import pyplot as plt from sklearn.datasets import load_iris from scipy.cluster import hierarchy # 层次聚类 import matplotlib as mpl mpl.rcParams['font.sans-serif'] = ['FangSong'] # 指定中文字体 mpl.rcParams['axes.unicode_minus'] = False # 解决保存图像是负号'-'显示为方块的问题 plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False # 正常显示负号 iris = load_iris() data = iris.data label = iris.target fig = plt.subplots(1, 1, figsize=(50, 8)) # figsize为画布大小 Z = hierarchy.linkage(data, method='average', metric="euclidean") # 计算合并类的方法,这里是取平均距离,距离用的是欧氏距离 hierarchy.dendrogram(Z, labels=label, above_threshold_color='C0') # 画层次聚类图 plt.plot(linewidth=1.0) plt.xticks(fontsize=14, rotation=0) # x轴标签字体大小与方向调整 plt.rcParams['savefig.dpi'] = 200 # 图片像素 plt.rcParams['figure.dpi'] = 200 # 分辨率 plt.tight_layout() # 自动调整子图参数,使之填充整个图像区域 plt.savefig("H_iris.png", dpi=100, bbox_inches='tight') # 保存图片 plt.show()
