当前位置:   article > 正文

sklearn工具包---分类效果评估(acc、recall、F1、ROC、回归、距离)_accuracy_score改成什么了

accuracy_score改成什么了

一、acc、recall、F1、混淆矩阵、分类综合报告

1、准确率

第一种方式:accuracy_score

  1. # 准确率
  2. import numpy as np
  3. from sklearn.metrics import accuracy_score
  4. y_pred = [0, 2, 1, 3,9,9,8,5,8]
  5. y_true = [0, 1, 2, 3,2,6,3,5,9] #共9个数据,3个相同
  6. accuracy_score(y_true, y_pred)
  7. Out[127]: 0.33333333333333331
  8. accuracy_score(y_true, y_pred, normalize=False) # 类似海明距离,每个类别求准确后,再求微平均
  9. Out[128]: 3

第二种方式:metrics

宏平均微平均更合理,但也不是说微平均一无是处,具体使用哪种评测机制,还是要取决于数据集中样本分布。

宏平均(Macro-averaging),是先对每一个类统计指标值,然后在对所有类求算术平均值。 
微平均(Micro-averaging),是对数据集中的每一个实例不分类别进行统计建立全局混淆矩阵,然后计算相应指标。(来源:谈谈评价指标中的宏平均和微平均

  1. from sklearn import metrics
  2. metrics.precision_score(y_true, y_pred, average='micro') # 微平均,精确率
  3. Out[130]: 0.33333333333333331
  4. metrics.precision_score(y_true, y_pred, average='macro') # 宏平均,精确率
  5. Out[131]: 0.375
  6. metrics.precision_score(y_true, y_pred, labels=[0, 1, 2, 3], average='macro') # 指定特定分类标签的精确率
  7. Out[133]: 0.5

其中average参数有五种:(None, ‘micro’, ‘macro’, ‘weighted’, ‘samples’) 

2、召回率

  1. metrics.recall_score(y_true, y_pred, average='micro')
  2. Out[134]: 0.33333333333333331
  3. metrics.recall_score(y_true, y_pred, average='macro')
  4. Out[135]: 0.3125

3、F1

  1. metrics.f1_score(y_true, y_pred, average='weighted')
  2. Out[136]: 0.37037037037037035

4、混淆矩阵

  1. # 混淆矩阵
  2. from sklearn.metrics import confusion_matrix
  3. confusion_matrix(y_true, y_pred)
  4. Out[137]:
  5. array([[1, 0, 0, ..., 0, 0, 0],
  6. [0, 0, 1, ..., 0, 0, 0],
  7. [0, 1, 0, ..., 0, 0, 1],
  8. ...,
  9. [0, 0, 0, ..., 0, 0, 1],
  10. [0, 0, 0, ..., 0, 0, 0],
  11. [0, 0, 0, ..., 0, 1, 0]])

横为true label 竖为predict  


 

5、 分类报告

  1. # 分类报告:precision/recall/fi-score/均值/分类个数
  2. from sklearn.metrics import classification_report
  3. y_true = [0, 1, 2, 2, 0]
  4. y_pred = [0, 0, 2, 2, 0]
  5. target_names = ['class 0', 'class 1', 'class 2']
  6. print(classification_report(y_true, y_pred, target_names=target_names))

其中的结果:

  1. precision recall f1-score support
  2. class 0 0.67 1.00 0.80 2
  3. class 1 0.00 0.00 0.00 1
  4. class 2 1.00 1.00 1.00 2
  5. avg / total 0.67 0.80 0.72 5

包含:precision/recall/fi-score/均值/分类个数 

6、 kappa score

kappa score是一个介于(-1, 1)之间的数. score>0.8意味着好的分类;0或更低意味着不好(实际是随机标签)

  1. from sklearn.metrics import cohen_kappa_score
  2. y_true = [2, 0, 2, 2, 0, 1]
  3. y_pred = [0, 0, 2, 2, 0, 2]
  4. cohen_kappa_score(y_true, y_pred)

二、ROC

1、计算ROC值

  1. import numpy as np
  2. from sklearn.metrics import roc_auc_score
  3. y_true = np.array([0, 0, 1, 1])
  4. y_scores = np.array([0.1, 0.4, 0.35, 0.8])
  5. roc_auc_score(y_true, y_scores)

2、ROC曲线

  1. y = np.array([1, 1, 2, 2])
  2. scores = np.array([0.1, 0.4, 0.35, 0.8])
  3. fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2)

来看一个官网例子,贴部分代码,全部的code见:Receiver Operating Characteristic (ROC)

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from itertools import cycle
  4. from sklearn import svm, datasets
  5. from sklearn.metrics import roc_curve, auc
  6. from sklearn.model_selection import train_test_split
  7. from sklearn.preprocessing import label_binarize
  8. from sklearn.multiclass import OneVsRestClassifier
  9. from scipy import interp
  10. # Import some data to play with
  11. iris = datasets.load_iris()
  12. X = iris.data
  13. y = iris.target
  14. # 画图
  15. all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))
  16. # Then interpolate all ROC curves at this points
  17. mean_tpr = np.zeros_like(all_fpr)
  18. for i in range(n_classes):
  19. mean_tpr += interp(all_fpr, fpr[i], tpr[i])
  20. # Finally average it and compute AUC
  21. mean_tpr /= n_classes
  22. fpr["macro"] = all_fpr
  23. tpr["macro"] = mean_tpr
  24. roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])
  25. # Plot all ROC curves
  26. plt.figure()
  27. plt.plot(fpr["micro"], tpr["micro"],
  28. label='micro-average ROC curve (area = {0:0.2f})'
  29. ''.format(roc_auc["micro"]),
  30. color='deeppink', linestyle=':', linewidth=4)
  31. plt.plot(fpr["macro"], tpr["macro"],
  32. label='macro-average ROC curve (area = {0:0.2f})'
  33. ''.format(roc_auc["macro"]),
  34. color='navy', linestyle=':', linewidth=4)
  35. colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
  36. for i, color in zip(range(n_classes), colors):
  37. plt.plot(fpr[i], tpr[i], color=color, lw=lw,
  38. label='ROC curve of class {0} (area = {1:0.2f})'
  39. ''.format(i, roc_auc[i]))
  40. plt.plot([0, 1], [0, 1], 'k--', lw=lw)
  41. plt.xlim([0.0, 1.0])
  42. plt.ylim([0.0, 1.05])
  43. plt.xlabel('False Positive Rate')
  44. plt.ylabel('True Positive Rate')
  45. plt.title('Some extension of Receiver operating characteristic to multi-class')
  46. plt.legend(loc="lower right")
  47. plt.show()

这里写图片描述

三、距离

1、海明距离

  1. from sklearn.metrics import hamming_loss
  2. y_pred = [1, 2, 3, 4]
  3. y_true = [2, 2, 3, 4]
  4. hamming_loss(y_true, y_pred)
  5. 0.25

2、Jaccard距离

  1. import numpy as np
  2. from sklearn.metrics import jaccard_similarity_score
  3. y_pred = [0, 2, 1, 3,4]
  4. y_true = [0, 1, 2, 3,4]
  5. jaccard_similarity_score(y_true, y_pred)
  6. 0.5
  7. jaccard_similarity_score(y_true, y_pred, normalize=False)
  8. 2

四、回归

1、 可释方差值(Explained variance score)

  1. from sklearn.metrics import explained_variance_score
  2. y_true = [3, -0.5, 2, 7]
  3. y_pred = [2.5, 0.0, 2, 8]
  4. explained_variance_score(y_true, y_pred)

2、 平均绝对误差(Mean absolute error)

  1. from sklearn.metrics import mean_absolute_error
  2. y_true = [3, -0.5, 2, 7]
  3. y_pred = [2.5, 0.0, 2, 8]
  4. mean_absolute_error(y_true, y_pred)

3、 均方误差(Mean squared error)

  1. from sklearn.metrics import mean_squared_error
  2. y_true = [3, -0.5, 2, 7]
  3. y_pred = [2.5, 0.0, 2, 8]
  4. mean_squared_error(y_true, y_pred)

4、中值绝对误差(Median absolute error)

  1. from sklearn.metrics import median_absolute_error
  2. y_true = [3, -0.5, 2, 7]
  3. y_pred = [2.5, 0.0, 2, 8]
  4. median_absolute_error(y_true, y_pred)

​​​​​​​5、 R方值,确定系数

  1. from sklearn.metrics import r2_score
  2. y_true = [3, -0.5, 2, 7]
  3. y_pred = [2.5, 0.0, 2, 8]
  4. r2_score(y_true, y_pred)

参考文献:

sklearn中的模型评估

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/142854
推荐阅读
相关标签
  

闽ICP备14008679号