What is IoU?

It might be difficult to select the optimum model architecture and pre-trained weights for your purpose. Stop looking at aggregate metrics and start looking at the data and model findings in greater depth to figure out what’s working and what isn’t.

While it’s crucial to be able to evaluate multiple models readily, limiting a model’s performance to a single number (mAP) might mask the complexities in the model findings that are relevant to your situation. You should also think about:

  • IoU
  • False positives with a high level of confidence
  • To spot-check performance, take individual samples.
  • Performance in the classes that are most important to your work

When computing mAP, intersection over union (IoU) is employed. It’s a value between 0 and 1 that indicates how much the expected and ground truth bounding boxes overlap.

  • An IoU of 0 indicates that the boxes do not overlap.
  • An IoU of 1 indicates that the boxes’ union is equal to their overlap, suggesting that they are totally overlapping.

When collecting human annotations, the IoU is a crucial accuracy metric to keep track of. The industry standard is to maintain a specific IoU standard for their human annotation tasks, ensuring that the delivered annotations have an IoU >= X in comparison to the “perfect” annotation of that object, as defined by the program’s annotation architecture. State-of-the-art detectors, on the other hand, seldom achieve a 0.95 IoU, as we demonstrate in the tests later in this piece.


The mean average precision (mAP) is determined by combining a collection of anticipated object detections with a set of ground truth object annotations.

IoU is calculated with regard to every single ground box in the picture.

IoUs are normalized at a certain value (0.5 to 0.95).

For each object class, a precision-recall (PR) curve is created, and the average precision (AP) is calculated. A PR curve considers a model’s performance in terms of true positives, false positives, and false negatives across a variety of confidence levels.

  • The greatest explanation for why mAP has become the de facto standard for comparing object detection models is that it is simple to use. To compare the performance of different models, you simply need to utilize a single number.

Key takeaways

There is no ideal model for each work; the optimal model for you is determined by the criteria you choose and the ultimate use case you have in mind. Each of the three models we’ve looked at shines in distinct scenarios in ways that its mAP doesn’t explain.

The essential lesson is that, rather than depending just on mAP, it is critical to employ the proper measure to drive model evaluation and choosing based on the individual operating conditions. There was no one model that performed best in the above three areas after assessing various factors of model performance. Depending on the ultimate job, any of the three models shown below might be the best option.

Image or detection-level annotation is substantially more costly than drawing and setting boundary boxes. You’ll want your predicted boxes to be as tight as possible so that annotators can quickly go through them and determine which to keep and which to discard without having to change the boxes.