赞
踩
Blog - Naftali Harris: Statistician, Hacker and Climberhttps://www.naftaliharris.com/blog/
- % 设置随机数生成器的种子以便结果可复现
- rng(0);
-
- % 定义三个聚类中心
- means = [1 2 3 4; 5 6 7 8; 9 10 11 12];
-
- % 定义三个聚类的协方差矩阵,这里假设特征间相关性不大,使用对角矩阵
- covariances = repmat(eye(4), [1, 1, 3]) * 0.5; % 较小的协方差值意味着数据点更紧密
-
- % 定义每个聚类的样本数
- clusterSizes = [50, 50, 50];
-
- % 初始化数据矩阵
- data = [];
-
- % 使用循环为每个聚类生成数据
- for i=1:size(means, 1)
- % 为每个聚类生成随机数据
- clusterData = mvnrnd(means(i, :), covariances(:, :, i), clusterSizes(i));
- % 将生成的数据加入到数据矩阵中
- data = [data; clusterData];
- end
-
-
- % 使用指定的簇个数执行 K 均值聚类(K 值)
- K = 3;
- [clusterIndices,centroids] = kmeans(data,K);
- clear K
-
- % 显示结果
-
- % 显示二维散点图(PCA)
- figure
- [~,score] = pca(data);
- clusterMeans = grpstats(score,clusterIndices,"mean");
- h = gscatter(score(:,1),score(:,2),clusterIndices,colormap("lines"));
- for i2 = 1:numel(h)
- h(i2).DisplayName = strcat("Cluster",h(i2).DisplayName);
- end
- clear h i2 score
- hold on
- h = scatter(clusterMeans(:,1),clusterMeans(:,2),50,"kx","LineWidth",2);
- hold off
- h.DisplayName = "ClusterMeans";
- clear h clusterMeans
- legend;
- title("First 2 PCA Components of Clustered Data");
- xlabel("First principal component");
- ylabel("Second principal component");
下图分别为k=3,2,4的结果
- % 使用指定范围选择最佳簇个数(K 值)
- fh = @(X,K)(kmeans(X,K));
- eva = evalclusters(data,fh,"CalinskiHarabasz","KList",2:5);
- clear fh
- K = eva.OptimalK;
- clusterIndices = eva.OptimalY;
-
- % 显示簇计算标准值
- figure
- bar(eva.InspectedK,eva.CriterionValues);
- xticks(eva.InspectedK);
- xlabel("Number of clusters");
- ylabel("Criterion values - Calinski-Harabasz");
- legend("Optimal number of clusters is " + num2str(K))
- title("Evaluation of Optimal Number of Clusters")
- disp("Optimal number of clusters is " + num2str(K));
- clear K eva
-
- % 计算质心
- centroids = grpstats(data,clusterIndices,"mean");
-
- % 显示结果
-
- % 显示二维散点图(PCA)
- % 使用指定范围选择最佳簇个数(K 值)
- fh = @(X,K)(kmeans(X,K));
- eva = evalclusters(data,fh,"DaviesBouldin","KList",2:5);
- clear fh
- K = eva.OptimalK;
- clusterIndices = eva.OptimalY;
-
- % 显示簇计算标准值
-
-
- % 计算质心
优化目标
- function o = ob_GA_kmeans(x,data,K,n)
-
- m=reshape(x,[K,n]); %将x重组为K行n列的矩阵
- d=pdist2(data,m); % 到质心的欧式距离
- [dmin,~]=min(d,[],2);
- o=sum(dmin);
- end
优化
- %% 初始化
-
-
- clc;clear;
- load("data.mat");
- n=size(data,2); %变量数,特征值
- K=3; %给一个默认K值
- N=n*K; %计算自变量数,每个质心对应几行几列,
- % 行对应分类数,列对应变量数
-
- %% 优化
-
- % 将固定参数传递给 objfun
- objfun = @(x)ob_GA_kmeans(x,data,K,n);
-
- % 设置非默认求解器选项
- options = optimoptions("ga","PlotFcn","gaplotbestf");
-
- % 求解
- [solution,objectiveValue] = ga(objfun,N,[],[],[],[],[],[],[],[],options);
-
- % 清除变量
- clearvars objfun options
-
- %% 绘制
-
- m=reshape(solution,[K,n]); %将x重组为K行n列的矩阵
- d=pdist2(data,m); % 到质心的欧式距离
- [dmin,post]=min(d,[],2);
- % 显示二维散点图(PCA)
- figure
- [~,score] = pca(data);
- clusterMeans = grpstats(score,post,"mean");
- h = gscatter(score(:,1),score(:,2),post,colormap("lines"));
- for i2 = 1:numel(h)
- h(i2).DisplayName = strcat("Cluster",h(i2).DisplayName);
- end
- clear h i2 score
- hold on
- h = scatter(clusterMeans(:,1),clusterMeans(:,2),50,"kx","LineWidth",2);
- hold off
- h.DisplayName = "ClusterMeans";
- clear h clusterMeans
- legend;
- title("First 2 PCA Components of Clustered Data");
- xlabel("First principal component");
- ylabel("Second principal component");
结果
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。