当前位置:   article > 正文

对聚类结果和真实标签进行映射的两种思路_聚类结果怎么对应标签

聚类结果怎么对应标签

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档


前言

设真实标签有m类,聚类结果类别数目和真实标签类别数目一样,也是m类。但是真实标签与预测标签类标可能不同,比如真实标签是[1,1,2,2,3,3],预测标签是[3,3,1,1,2,2],那就是说真实标签1对应预测标签3,2对应1,3对应1。那么如何实现真实标签和聚类结果标签的映射呢?


一、根据重复度最大的值直接确定标签映射关系

  1. %真实标签:La1 聚类结果标签:La2 映射后的标签:NewLabel
  2. Label1=unique(La1');
  3. L1=length(Label1);
  4. Label2=unique(La2');
  5. L2=length(Label2);
  6. ncls=L2;
  7. label=zeros(1,ncls);
  8. for k=1:L2
  9. index=find(La2==k);
  10. tmp=zeros(1,ncls);
  11. for j=1:ncls
  12. tmp(j)=sum(La1(index)==j);
  13. end
  14. [~,l]=max(tmp);
  15. label(k)=l;
  16. end
  17. NewLabel=label(La2);

这种方法就是我们直接找出对应每个预测标签,每一行中重复度最大的值,确定它的位置就可以了,但是这样的话会出现多个不同行的重复度最大的值在同一列的情况(即真实标签和聚类结果标签的映射不是1对1),这显然是不合理的。

二、使用匈牙利算法

1.计算真实标签和聚类标签结果的重复度,并将结果存储在矩阵G中

  1. %真实标签:La1 聚类结果标签:La2 映射后的标签:NewLabel
  2. Label1=unique(La1');
  3. L1=length(Label1);
  4. Label2=unique(La2');
  5. L2=length(Label2);
  6. %构建计算两种分类标签重复度的矩阵G
  7. G = zeros(max(L1,L2),max(L1,L2));
  8. for i=1:L1
  9. index1= La1==Label1(1,i);
  10. for j=1:L2
  11. index2= La2==Label2(1,j);
  12. G(i,j)=sum(index1.*index2);
  13. end
  14. end

2.使用匈牙利算法

  1. %利用匈牙利算法计算出映射重排后的矩阵
  2. [index]=munkres(-G);
  3. %将映射重排结果转换为一个存储有映射重排后标签顺序的行向量
  4. [temp]=MarkReplace(index);
  5. %生成映射重排后的标签NewLabel
  6. NewLabel=zeros(size(La2));
  7. for i=1:L2
  8. NewLabel(La2==Label2(i))=temp(i);
  9. end
  10. end
  1. function [assignment] = munkres(costMat)
  2. % MUNKRES Munkres Assign Algorithm
  3. %
  4. % [ASSIGN,COST] = munkres(COSTMAT) returns the optimal assignment in ASSIGN
  5. % with the minimum COST based on the assignment problem represented by the
  6. % COSTMAT, where the (i,j)th element represents the cost to assign the jth
  7. % job to the ith worker.
  8. %
  9. % This is vectorized implementation of the algorithm. It is the fastest
  10. % among all Matlab implementations of the algorithm.
  11. % Examples
  12. % Example 1: a 5 x 5 example
  13. %{
  14. [assignment,cost] = munkres(magic(5));
  15. [assignedrows,dum]=find(assignment);
  16. disp(assignedrows'); % 3 2 1 5 4
  17. disp(cost); %15
  18. %}
  19. % Example 2: 400 x 400 random data
  20. %{
  21. n=5;
  22. A=rand(n);
  23. tic
  24. [a,b]=munkres(A);
  25. toc
  26. %}
  27. % Reference:
  28. % "Munkres' Assignment Algorithm, Modified for Rectangular Matrices",
  29. % http://csclab.murraystate.edu/bob.pilgrim/445/munkres.html
  30. % version 1.0 by Yi Cao at Cranfield University on 17th June 2008
  31. assignment = false(size(costMat));
  32. costMat(costMat~=costMat)=Inf;
  33. validMat = costMat<Inf;
  34. validCol = any(validMat);
  35. validRow = any(validMat,2);
  36. nRows = sum(validRow);
  37. nCols = sum(validCol);
  38. n = max(nRows,nCols);
  39. if ~n
  40. return
  41. end
  42. dMat = zeros(n);
  43. dMat(1:nRows,1:nCols) = costMat(validRow,validCol);
  44. %*************************************************
  45. % Munkres' Assignment Algorithm starts here
  46. %*************************************************
  47. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  48. % STEP 1: Subtract the row minimum from each row.
  49. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  50. dMat = bsxfun(@minus, dMat, min(dMat,[],2));
  51. %**************************************************************************
  52. % STEP 2: Find a zero of dMat. If there are no starred zeros in its
  53. % column or row start the zero. Repeat for each zero
  54. %**************************************************************************
  55. zP = ~dMat;
  56. starZ = false(n);
  57. while any(zP(:))
  58. [r,c]=find(zP,1);
  59. starZ(r,c)=true;
  60. zP(r,:)=false;
  61. zP(:,c)=false;
  62. end
  63. while 1
  64. %**************************************************************************
  65. % STEP 3: Cover each column with a starred zero. If all the columns are
  66. % covered then the matching is maximum
  67. %**************************************************************************
  68. primeZ = false(n);
  69. coverColumn = any(starZ);
  70. if ~any(~coverColumn)
  71. break
  72. end
  73. coverRow = false(n,1);
  74. while 1
  75. %**************************************************************************
  76. % STEP 4: Find a noncovered zero and prime it. If there is no starred
  77. % zero in the row containing this primed zero, Go to Step 5.
  78. % Otherwise, cover this row and uncover the column containing
  79. % the starred zero. Continue in this manner until there are no
  80. % uncovered zeros left. Save the smallest uncovered value and
  81. % Go to Step 6.
  82. %**************************************************************************
  83. zP(:) = false;
  84. zP(~coverRow,~coverColumn) = ~dMat(~coverRow,~coverColumn);
  85. Step = 6;
  86. while any(any(zP(~coverRow,~coverColumn)))
  87. [uZr,uZc] = find(zP,1);
  88. primeZ(uZr,uZc) = true;
  89. stz = starZ(uZr,:);
  90. if ~any(stz)
  91. Step = 5;
  92. break;
  93. end
  94. coverRow(uZr) = true;
  95. coverColumn(stz) = false;
  96. zP(uZr,:) = false;
  97. zP(~coverRow,stz) = ~dMat(~coverRow,stz);
  98. end
  99. if Step == 6
  100. % *************************************************************************
  101. % STEP 6: Add the minimum uncovered value to every element of each covered
  102. % row, and subtract it from every element of each uncovered column.
  103. % Return to Step 4 without altering any stars, primes, or covered lines.
  104. %**************************************************************************
  105. M=dMat(~coverRow,~coverColumn);
  106. minval=min(min(M));
  107. if minval==inf
  108. return
  109. end
  110. dMat(coverRow,coverColumn)=dMat(coverRow,coverColumn)+minval;
  111. dMat(~coverRow,~coverColumn)=M-minval;
  112. else
  113. break
  114. end
  115. end
  116. %**************************************************************************
  117. % STEP 5:
  118. % Construct a series of alternating primed and starred zeros as
  119. % follows:
  120. % Let Z0 represent the uncovered primed zero found in Step 4.
  121. % Let Z1 denote the starred zero in the column of Z0 (if any).
  122. % Let Z2 denote the primed zero in the row of Z1 (there will always
  123. % be one). Continue until the series terminates at a primed zero
  124. % that has no starred zero in its column. Unstar each starred
  125. % zero of the series, star each primed zero of the series, erase
  126. % all primes and uncover every line in the matrix. Return to Step 3.
  127. %**************************************************************************
  128. rowZ1 = starZ(:,uZc);
  129. starZ(uZr,uZc)=true;
  130. while any(rowZ1)
  131. starZ(rowZ1,uZc)=false;
  132. uZc = primeZ(rowZ1,:);
  133. uZr = rowZ1;
  134. rowZ1 = starZ(:,uZc);
  135. starZ(uZr,uZc)=true;
  136. end
  137. end
  138. %生成标签矩阵
  139. assignment(validRow,validCol) = starZ(1:nRows,1:nCols);
  140. %解决标签映射问题不需要计算权重cost,故将其注释
  141. %cost = 0;
  142. %cost = sum(costMat(assignment));
  1. %将存储标签顺序的空间矩阵转换为一个行向量
  2. function [assignment] = MarkReplace(MarkMat)
  3. [rows,cols]=size(MarkMat);
  4. assignment=zeros(1,cols);
  5. for i=1:rows
  6. for j=1:cols
  7. if MarkMat(i,j)==1
  8. assignment(1,j)=i;
  9. end
  10. end
  11. end
  12. end

总结

显然,使用第二种方法更好。但是,当预测类标分类数大于实际类标分类数,比如,实际类标10类,预测类标15类,就无法使用匈牙利算法。因为匈牙利算法实际上是一种指派问题,只适合于一对一的指派

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/129014
推荐阅读
相关标签
  

闽ICP备14008679号