当前位置:   article > 正文

基于自编码器的离群点检测算法的Matlab版实现_离群点检测matlab编程

离群点检测matlab编程

基于自编码器的无监督离群点检测算法的核心思想是:通过将待检测数据集输入自编码器进行训练,训练完成后,那些难以被重构的对象(即重构误差较大的对象)被认为是离群点。

基于AE的outlier detection存在的前提,也就是有一个假设条件存在,即:离群点难以被自编码器在输出层重构。

基于AE的outlier detection的Matlab版实现如下:

  1. function [outputArg1,outputArg2] = GD_AE_OD(inputArg1,inputArg2)
  2. %SIMPLEBP_MORENEURONMOREHIDDEN 此处显示有关此函数的摘要
  3. % 此处显示详细说明
  4. %通过计算待检测数据集的重构误差来计算样本的离群值,越大越有可能是离群点。
  5. x=load('Normalization_wbc.txt')';%每一列是一个样本
  6. y=load('Normalization_wbc.txt')';
  7. Label=load('Label_wbc.txt');
  8. ADLabels=load('Label_wbc.txt');
  9. [m,n]=size(x);%m表示有多少个特征,n表示有多少个样本
  10. %初始化
  11. Layer2_hiddensize=5;%第一个隐藏层的神经元个数
  12. Layer3_hiddensize=m;%第二个隐藏层的神经元个数
  13. Layer2_w=rand(Layer2_hiddensize,m);%rand(m,n)表示随机生成m行n列的0-1之间的矩阵,如果rand(m)表示生成m*m的方阵的随机矩阵
  14. Layer2_b=rand(Layer2_hiddensize,1);
  15. Layer3_w=rand(Layer3_hiddensize,Layer2_hiddensize);
  16. Layer3_b=rand(Layer3_hiddensize,1);
  17. %初始化隐藏层的输出
  18. Layer2_output=rand(Layer2_hiddensize,1);
  19. Layer3_output=rand(Layer3_hiddensize,1);
  20. %初始化隐藏层的梯度
  21. Layer2_e=rand(Layer2_hiddensize,1);
  22. Layer3_e=rand(Layer3_hiddensize,1);
  23. iteration=1000;%迭代次数
  24. LearningRate=0.1;
  25. Abnormal_number=20;%异常点个数
  26. for t=1:iteration
  27. %%%%%%%%%%%标准正向传播%%%%%%%%
  28. for i=1:n
  29. %%%%%正传%%%%%%%%%%%%
  30. Layer2_output = Sigmoid( Layer2_w * x(:,i) - Layer2_b);
  31. total_Layer2_output(:,i)=Layer2_output;
  32. Layer3_output = Sigmoid( Layer3_w * Layer2_output - Layer3_b);
  33. x(:,i)=Layer3_output;
  34. %%%%%%%反传%%%%%%%%%%%
  35. Layer3_e=Layer3_output .* (1-Layer3_output) .* (y(:,i)-Layer3_output);
  36. Layer2_e=Layer2_output .* (1-Layer2_output) .* (Layer3_w' * Layer3_e);
  37. Layer3_w = Layer3_w + LearningRate * Layer3_e * Layer2_output';
  38. Layer2_w = Layer2_w + LearningRate * Layer2_e * x(:,i)';
  39. Layer3_b = Layer3_b - LearningRate * Layer3_e;
  40. Layer2_b = Layer2_b - LearningRate * Layer2_e;
  41. end
  42. Loss=(y-x).*(y-x);%loss中每一列表示对应列的样本的损失值
  43. EverySample_Loss=sum(Loss,1);
  44. TotalLoss(t,:)=sum(EverySample_Loss)/n;
  45. end
  46. mse=sum(Loss,1)';
  47. auc = Measure_AUC(mse, ADLabels);
  48. disp(auc)
  49. [OF_value,index_number]=sort(mse);
  50. ODA_AbnormalObject_Number=index_number(n-Abnormal_number+1:end,:);%outlier detection algorithm 算法认定的异常对象的编号
  51. ODA_NormalObject_Number=index_number(1:n-Abnormal_number,:);%outlier detection algorithm算法认定的正常对象的编号
  52. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  53. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  54. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%算法实际的检测率/准确率/误报率等评价指标的计算%%%%%%%%%%%%%%%%%%%%%%%%
  55. %%%%Real_NormalObject_Number表示数据集中真正的正常对象的编号,Real_AbnormalObject_Number表示数据集中真正异常对象的编号
  56. [Real_NormalObject_Number,Real_Normal]=find(Label==0);
  57. [Real_AbnormalObject_Number,Real_Abnormal]=find(Label==1);
  58. %正例是异常对象,反例是正常对象
  59. TP=length(intersect(Real_AbnormalObject_Number,ODA_AbnormalObject_Number));
  60. FP=length(Real_AbnormalObject_Number)-TP;
  61. TN=length(intersect(Real_NormalObject_Number,ODA_NormalObject_Number));
  62. FN=length(Real_NormalObject_Number)-TN;
  63. %准确率
  64. ACC=(TP+TN)/(TP+TN+FP+FN);
  65. fprintf('准确率ACC= %8.5f\n',ACC*100)
  66. %检测率==查全率=R
  67. DR=TP/(TP+FN);
  68. fprintf('检测率DR= %8.5f\n',DR*100)
  69. %查准率P
  70. P=TP/(TP+FP);
  71. fprintf('查准率P= %8.5f\n',P*100)
  72. %误报率
  73. FAR=FP/(TN+FP);
  74. fprintf('误报率FAR= %8.5f\n',FAR*100)
  75. %绘制混淆矩阵
  76. Confusion_matrix=[TP,FN;FP,TN];
  77. Figure_Confusion_matrix=heatmap(Confusion_matrix);
  78. figure(2)
  79. for j=0:iteration-1
  80. j=j+1;
  81. axis_x(j,:)=j;
  82. end
  83. plot(axis_x,TotalLoss,'LineWidth',2);
  84. end

请配合我其他博客的计算AUC方法的函数一起使用

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/喵喵爱编程/article/detail/920398
推荐阅读
相关标签
  

闽ICP备14008679号