Precision and recall are two of the most perplexing topics for any machine learning engineer or data scientist. The most crucial question that emerges once you have developed your model is how excellent it is. As a result, the most crucial activity in a data science project is evaluating your model, which determines how accurate your predictions are. Precision and recall are two indicators used in the assessment process.
For categorization issues such as determining if an email is a spam, whether a person is a terrorist, whether a person has cancer, or whether a person is eligible for a loan. The target column in all of these cases is a Yes or No type (binary classification). The best statistic for evaluating classification issues is accuracy, which measures how well the model predicts the data. However, accuracy measures are only employed when the data is balanced. Precision, recall, and f1 score are useful measures of model performance for unbalanced datasets. We must first comprehend the confusion matrix before moving on to accuracy and recall.
It’s an NxN matrix that is used to evaluate the performance of a model, with N being the number of classes to be targeted. The confusion matrix parallels the actual goal values to the machine learning model’s predictions.
True positive, false positive, true negative and false negative are the four distinct results in a binary classification confusion matrix.
- True Positive: Data points that are projected to be positive but turn out to be such.
- Data points that are projected to be positive but are really negative are known as false positives.
- False Negative: Data points that are projected to be negative but really turn out to be positive.
- True Negative: Data points that are projected to be negative but really are.
If the confusion matrix’s diagonal values (TP, TN) are high, a model is said to be the best model.
False Positive is a Type-I mistake caused by our machine learning algorithm misclassifying data. For example, predicting that a man is pregnant when he is not.
False Negative is a Type-II mistake caused by our ML model’s misclassification. For example, predicting that a lady is not pregnant when she is actually pregnant.
Accuracy, Precision, and Recall
When assessing datasets using problem statements, you’ll come across concepts like recall and precision. Industrial and research-related models are among the datasets.
The ratio of True Positives (TP) to total positive predictions is the precision value. The total number of positive predictions is the sum of True Positives (TP) and False Positives (FP). The Precision Formula is as follows:
- Precision = TP/ (TP+FP)
Precisions assist us in measuring the model’s relevant value data points.
The percentage of useful results that data models correctly identify is referred to as recall value. The accuracy of data models is measured by the recall value. The sensitivity of a model is also known as the recall. The ratio of True Positives (TP) to the total of True Positives (TP) and False Negatives (FN) is used to calculate recall:
- Recall = TP/ (TP+FN)
The ratio of the sum of correct forecasts to the entire number of predictions is known as accuracy. The accuracy formula is as follows:
- Accuracy = (TP + TN) / (TP+FP+TN+FN)
An efficient model has a high accuracy value, which is optimal for symmetric models. When comparing the various models, it is fairly easy to tell the difference between accuracy and prediction. Accuracy has various flaws, such as the fact that it is inefficient for datasets with two or more data classes since they may be overlooked. Additionally, the correctness of a dataset is not adequately reflected if it is non-symmetric or uneven. In this scenario, recall and accuracy come to the rescue. We can create efficient data assessment models by combining precision, recall, F1 score, and a confusion matrix.
Each study has its own set of requirements. Recall and precision are comparable, but in certain circumstances, having a high recall is more significant than having high accuracy, and vice versa. It’s crucial to remember that you can’t have both excellent accuracy and recall. It all depends on the precise criteria of the research, which will determine which indicators you should focus on.