Bounding Box Annotations

Data annotation is one of the major tasks pertaining to computer vision. It allows machine learning models to establish a relationship between the input and its respective output. Today we have different types of images and video annotation techniques but one of the fundamentals and basic type of image and video annotation is the bounding box technique. This technique is extremely popular and easy to implement. But there are ways in which you can effectively practice this technique in your own projects.

In this article we will discuss some of the best practices and tricks that will enable you effectively implement bounding boxes in your own dataset.

What is bounding box annotation?

Bounding box annotation is a process of manually labeling or annotating an image with a bounding box around a specific object or feature of interest. This type of annotation techniques is commonly used in computer vision and machine learning applications, particularly in the field of object detection.

In bounding box annotation, a human annotator will draw a rectangle around the object or feature of interest in an image and label it with a class label. The annotator will also specify the coordinates of the bounding box, which typically consist of the x and y coordinates of the top left corner and the x and y coordinates of the bottom right corner of the bounding box.

Bounding box annotation can be a time-consuming process, but it is an important step in training machine learning models for object detection tasks.

Why Are Bounding Boxes Important?

There are several reasons why bounding boxes are important:

  1. Object detection: Bounding boxes can be used to identify and locate objects in an image or video. This is useful for applications such as image classification, object tracking, and face detection.
  2. Image annotation: Bounding boxes can be used to label and annotate objects in an image, providing valuable information for image databases and machine learning algorithms.
  3. Data visualization: Bounding boxes can be used to visualize and understand data in a more intuitive way. For example, they can be used to highlight specific features or patterns in an image.
  4. Object recognition: Bounding boxes can be used to recognize objects in an image or video by comparing the shape and position of the bounding box to a database of known objects.

Types of Bounding Boxes annotation

There are several types of bounding boxes that can be used depending on the specific application and the characteristics of the objects being enclosed. Some common types of bounding boxes include:

  1. Axis-aligned bounding boxes (AABBs): These bounding boxes are aligned with the x and y axes of the coordinate system and are commonly used in 2D computer graphics.
  2. Minimum bounding boxes (MBBs): These bounding boxes enclose an object with the minimum possible area, making them useful for applications such as object recognition and image compression.
  3. Rotated bounding boxes: These bounding boxes can be rotated to better enclose an object that is not oriented horizontally or vertically.
  4. Oriented bounding boxes (OBBs): These bounding boxes are similar to rotated bounding boxes, but they are defined by a center point and three axes that are perpendicular to each other. OBBs are useful for objects that have a more complex shape or orientation.
  5. Minimum volume bounding boxes: These bounding boxes enclose an object with the minimum possible volume, making them useful for 3D applications.
  6. Convex hull bounding boxes: These bounding boxes enclose an object with a convex polygon, which is a shape that has no indentations or “curves inwards.” Convex hull bounding boxes are useful for objects with a complex or irregular shape.

Bounding box annotation

Best Practices for Bounding Box Annotation

There are several practices that can be followed to ensure the quality and accuracy of bounding box annotations. These include:

Ensuring tightness

It’s important to ensure that the bounding box is tight enough to capture the object of interest, but not too tight that it includes the background or other objects in the image. Trying to eliminate other objects is a good practice. This can be achieved by carefully examining the image and adjusting the size and shape of the bounding box as needed.

Ensuring Intersection over Union (IoU)

IoU is a metric used to evaluate the accuracy of object detection algorithms. To ensure a high IoU, the bounding box should overlap as much as possible with the ground truth bounding box (the bounding box annotated by a human).

Ensuring pixel-perfect tightness

In some cases, it may be important to achieve pixel-perfect tightness in bounding box annotation, which means ensuring that the bounding box encloses the object of interest as tightly as possible without including any pixels that do not belong to the object.

Avoiding or reducing overlap

Earlier we discussed that bounding boxes must contain only one object of interest. Excessive overlap between bounding boxes can lead to confusion and may negatively impact the performance of the machine-learning model. To avoid overlap, you can adjust the bounding boxes as needed to eliminate the overlap.

Annotating diagonal items

Annotating diagonal objects with bounding boxes can be challenging because the bounding box needs to accurately enclose the object while also maintaining a diagonal orientation. Careful examination of the image and adjustment of the bounding box size and shape may be necessary to achieve accurate annotation of diagonal objects.

One of the best practices while annotating diagonal objects is using polygons and instance segmentation instead.

Labeling and Tagging Names

It is important to label every object of interest. It ensures the high accuracy performance of the machine learning model as they are built to map pixel patterns with the labels.

Box size

The size of the bounding boxes must vary as per the size of the object. If all the bounding boxes are of the same size then the model will not perform well. For instance, if the object is smaller and the size of the bounding box is the same as for the bigger image then it would capture unnecessary objects which would confuse the model. So it is always recommended to make sure that the box size must tightly captures the object of interest.

Also, make sure that you consider the model’s input size as well the network downsampling size. If the bounding boxes are too small, then that object’s information may be lost during the image downsampling parts of your network design and architecture.

Annotation of occluded objects

Objects that are not in full view because of an obstructing object is called Occluded object. These objects tend to provide half information because they are blocked. A few of points to keep in mind while annotating or tagging occluded objects are:

  1. Use bounding boxes if more than 60% of the object is visible.
  2. Use polygons if the visibility of the object is 30% to 60%.
  3. In any case, if the obstructing object is at the center of the object of interest then annotate it just like the fully visible images in the bounding box.
  4. If only a small part is visible then ignore the object. It would only serve as noise to the training dataset.

Tag every object of interest in an image

The more information provided the better the model will be. Keeping all the points above it is important to tag every object of interest so that the model will perform well.

Additional Tips and Tricks

Here are five tricks that can help to improve the efficiency and accuracy of bounding box annotation:

  1. Use appropriate tools: Use appropriate tools and software to draw the bounding boxes and label the objects. Some tools allow you to draw the bounding boxes directly on the image, while others may require you to specify the coordinates of the bounding box. Choose a tool that is easy to use and allows you to annotate the images efficiently.
  2. Use keyboard shortcuts: Many bounding box annotation tools provide keyboard shortcuts that can help to speed up the annotation process. For example, you can use the arrow keys to move the bounding box or the + and – keys to resize it.
  3. Use a template: If you are annotating multiple images that contain similar objects, you can create a template to save time. A template is a pre-drawn bounding box that you can use as a starting point for annotating the images. You can then adjust the size and position of the bounding box as needed to fit the object in the image.
  4. Use artificial intelligence: Some bounding box annotation tools use artificial intelligence to speed up the annotation process. For example, you can use a tool that automatically detects the objects in the image and generates a bounding box around them. You can then fine-tune the bounding box as needed to ensure accuracy.
  5. Use a consistent approach: To ensure consistency in the annotation process, it’s important to use a consistent approach when drawing the bounding boxes and labeling the objects. This can include following a set of guidelines or using a specific method for drawing the bounding boxes. By following a consistent approach, you can reduce the risk of errors and improve the accuracy of the annotations.

Additional Tips and Tricks


As data is becoming much more available the task of annotating images rises. Having a consistent and effective workflow can help you to effectively and efficiently annotate the dataset. This article provided you with some of the best practices and tricks that can make you productive when annotating the dataset for tasks pertaining to computer vision.