What is the Area under the Curve?

Performance evaluation is a critical role in Machine Learning. So, for a classification task, we may rely on an AUC – ROC Curve. We utilize the Area Under the Curve-Receiver Operating Characteristics (AUC-ROC) curve to verify or illustrate the performance of the multi-class classification issue. It is one of the most essential assessment measures for assessing the performance of any classification model. It’s often abbreviated as Area Under the Receiver Operating Characteristics or shorter AUROC.

  • The AUC curve is a performance metric for classification problems with varying input parameters.

The greater the AUC, the better the model forecasts 0 classes as 0 and 1 classes as 1. By analogy, the higher the AUC, the better the model distinguishes between individuals with the condition and those who do not have it.

The ROC curve is using TPR vs FPR. TPR on the y and FPR on the x.

Machine learning model performance

The AUC – ROC curve is a performance metric for classification issues with varying input parameters.

  • An excellent model has an AUC close to one, indicating that it has a high level of separability. A bad model has an AUC close to zero, indicating that it has the poorest measure of separability.

In reality, it indicates that it is reversing the result. When the AUC is 0.5, the model has no class separation capacity at all.

An ideal condition is when two curves do not overlap at all. That is, the model has a perfect measure of separability. It is completely capable of differentiating between positive and negative classes.

We establish type 1 and type 2 mistakes when two distributions overlap. We can decrease or enhance them depending on the threshold. When AUC is 0.7, it implies that the model has a 70% chance of distinguishing between positive and negative classes.

The worst-case scenario is when AUC is around 0.5.  The model lacks the ability to differentiate between positive and negative classes.

When the AUC is close to zero, the model is really reciprocating the classes. It indicates that the model predicts a negative class as a positive class and vice versa.

Sensitivity and Specificity

Specificity and sensitivity are inversely related to one another. As a result, increasing Sensitivity reduces Specificity and vice versa.

When we lower the threshold, we obtain more positive data, which enhances sensitivity while lowering specificity.

Similarly, increasing the threshold yields more negative results, resulting in increased specificity and decreased sensitivity.

FPR, as we know, is 1 – specificity. As a result, increasing TPR raises FPR and vice versa.

So, why should you use AUC?

  • AUC is scale-independent. It assesses how well predictions are scored as opposed to their absolute values.
  • AUC is classification-threshold insensitive. It assesses the accuracy of the model’s predictions regardless of the categorization level used.

However, all of these factors come with limitations that may restrict the use of AUC in particular applications:

  • Scale invariance isn’t necessarily a good thing. For example, sometimes we truly need highly calibrated probability outputs, but AUC won’t inform us.
  • Invariance of the classification threshold is not always desired. In instances when the cost of false negatives vs. false positives is vastly different, it may be important to decrease one kind of classification mistake. For example, when detecting email spam, you should probably emphasize reducing false positives (even if that results in a significant increase of false negatives). AUC isn’t a good measure for this sort of optimization.