What are anchor boxes?

The object detection models (YOLO, SDD, EfficientDet) start with anchor boxes as a prior and adapt from there in order to anticipate and locate many different items in an image.

Anchor boxes are usually used in the following sequence in modern models:

  • Thousands of candidate anchor boxes should be formed around the image.
  • Predict an offset from each anchor box as a potential box for each anchor box.
  • Create a loss function using the ground truth example as a guide.
  • Determine the likelihood that a specified offset box will overlap with an actual item.
  • If the probability is larger than 0.5, the prediction should be included in the loss function.
  • By rewarding and punishing anticipated boxes, the model is gradually pushed to exclusively localize real objects.

This is why you’ll see predicted boxes all over the place when you’ve barely trained a model.

Following training, your model will only make high-probability bets based on the anchor box offsets it believes are most likely to be true.

  • A predetermined collection of boxes with features like heights and  widths designed to match the same features of items is referred to as anchor boxes.

The suggested anchor boxes cover every conceivable item size combination that might appear in a dataset. It’s common to choose between 4 and 10 boxes to utilize as suggestions in various parts of the image.

Dl neural networks have outstanding results at picture categorization and object recognition in the field of computer vision. Window detectors were first used to locate single items in a pass. Detectors, which can process full pictures and produce numerous detections, have replaced sliding window detectors. To maximize the effectiveness and quickness of sliding window detection, these object detectors depend significantly on the notion of anchor boxes.

Training a detection network typically entails using traditional computer vision techniques to search for anchors, pairing proposed anchors with possible truth boxes. It’s worth noting that the anchor box idea may be used to forecast a set number of boxes.

When are anchor boxes needed?

Proposing anchors entails identifying a set of acceptable boxes that might accommodate the most items in data, putting imaginary, equally spaced boxes across a picture, and formulating a rule to transfer the outputs of a feature map to each place in the image.

As a result, a set of anchor boxes is offered once that defines your data. It is possible to do it at any moment before adding projected offsets to the proposed anchor at a point on the feature map.

  • Detectors do not predict boxes; instead, they forecast a value system for each suggested bounding box, such as anchor box coordinate offsets and reliability values for every taught subcategory.

This implies that the very same anchors will be provided across each picture, and predicted offsets from a forward pass will be used to alter those suggestions. Until the output is processed, The net is completely unaware of how to link a map coordinate to a place inside the picture, or that its output correlates to a box.

There is no need to match anchors with background categories or ground truth several times because each image will always be connected with the same set of fixed anchor proposals and ground truth will not change throughout training.

Naturally, this is dependent on your particular goals for optimization and how intelligent your batch generator can be. A batch generator is frequently used for both proposals and the matching of ground truth. Although proposal layers are sometimes appended to the net to provide anchor data to the net’s output tensor, the mechanism for producing and tiling proposals across an image is meant to remain the same.

  • Understanding this helps you understand when the system needs to start and have this data structure ready for real use.