当前位置:   article > 正文

GINI, CUMULATIVE ACCURACY PROFILE, AUC_accuracy profile study:raw data

accuracy profile study:raw data
In this article, we have covered how to calculate Gini Coefficient, Cumulative Accuracy Profile (CAP) and Area under Curve (AUC) of a predictive model. The purpose of this article is to explain these concepts in simple terms so that layman can understand the mathematics behind it.
Importance of these methods
These methods measure discriminatory power of a predictive model. Discriminatory power implies whether predictive model is able to distinguish between events (desired outcome) and non-events. In credit risk modeling, it evaluates whether the probability of default model is able to separate good and bad customers. These two metrics Cumulative Accuracy Profile and Gini Coefficient are more common in credit risk analytics as compared to other domains.
Table of Contents

Cumulative Accuracy Profile (CAP)

Cumulative Accuracy profile (CAP) of a credit rating model shows percentage of all borrowers (debtors) on the x-axis and the percentage of defaulters (bad customers) on the y-axis. In marketing analytics, it is called Gain Chart. It is also called Power Curve in some other domains.
Accuracy Ratio
Interpretation
By using CAP, you can compare the curve of your current model to the curve of 'ideal or perfect' model and can also compare it with the curve of random model. 'Perfect model' refers to the ideal state in which all the bad customers (desired outcome) can be captured directly. 'Random model' refers to the state in which the proportion of bad customers are distributed equally. 'Current Model' refers to your probability of default model (or any other model you are working on). We always try to build the model which leans toward (closer) to the curve of perfect model. We can read current model as '% of bad customers covered at a given decile level'. For example, 89% of bad customers captured by just selecting top 30% of debtors based on model.
Steps to create Cumulative Accuracy Profile curve
  1. Sort estimated probability of default in descending order and split it in 10 parts (decile). It means riskiest borrowers with high PD should be at top decile and safest borrowers should appear at bottom decile. Splitting score in 10 parts is not a thumb rule. Instead you can use rating grade.
  2. Calculate number of borrowers (observations) in each decile
  3. Calculate number of bad customers in each decile
  4. Calculate cumulative Number of bad customers in each decile
  5. Calculate percentage of bad customers in each decile
  6. Calculate cumulative percentage of bad customers in each decile
Cumulative Accuracy Profile Current Model

Till now, we have done calculation based on the PD model (Remember first step is based on the probabilities obtained from PD model).

Next step : What should be the number of bad customers in each decile based on perfect model?

  1. In perfect model, First decile should capture all the bad customers as first decile refers to worst rating grade OR borrowers with highest likelihood to default. In our case, first decile cannot capture all the bad customers as number of borrowers fall in the first decile is less than the total number of bad customers.
  2. Calculate cumulative number of bad customers in each decile based on perfect model
  3. Calculate cumulative % of bad customers in each decile based on perfect model

Next step : Calculate the cumulative percentage of bad customers in each decile based on random model In random model, each decile should constitute 10%. When we calculate cumulative %, it will be 10% in decile 1, 20% in decile 2 and so on till 100% in decile 10.
Cumulative Accuracy Profile Random Model

Next step : Create a plot with Cumulative % of Bads based on Current, Random and Perfect Model. In x axis, it shows percentage of borrowers (observations) and y axis represents percentage of Bad Customers.

Accuracy Ratio

In the case of CAP (Cumulative Accuracy Profile), Accuracy ratio is the ratio of the area between your current predictive model and the diagonal line and the area between the perfect model and the diagonal line. In other words, it is the ratio of the performance improvement of the current model over the random model to the performance improvement of the perfect model over the random model.
Accuracy Ratio
How to calculate Accuracy Ratio
accuracy Ratio

First step is to calculate area between current model and diagonal line. We can compute area below current model (including area below diagonal line) by using Trapezoidal Rule Numerical Integration method. The area of a trapezoid is

( xi+1 – xi ) * ( yi + yi+1 ) * 0.5
( x i+1 – x i ) is the width of subinterval and (y i + y i+1)*0.5 is the average height.

In this case, x refers to values of cumulative proportion of borrowers at different decile levels and y refers to cumulative proportion of bad customers at different decile levels. Value of x0 and y0 is 0.

Once above step is completed, next step is to subtract 0.5 from the area returned from the previous step. You must be wondering relevance of 0.5. It is the area below

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/270553
推荐阅读
相关标签
  

闽ICP备14008679号