The images that you use to train, validate, and test your machine learning (ML) algorithms have a significant impact on the success of your computer vision project.

As a result, image annotation has become a ubiquitous process in recent years and is necessary for almost any application that relies on artificial intelligence (AI) and ML.

Best practices for image annotation

For computer vision projects, every single image within every dataset must be thoroughly and accurately processed and annotated to sufficiently train a ML model to recognize the world similar to the way we humans can.

Due to how important it is to develop the most accurate and reliable ML models possible for any AI application—and, indeed, how much more important it is becoming with the passage of time—there are a few critical best practices to be mindful of. These include:

1. Robust dataset collection and processing

You can’t feed random datasets into a ML model and expect it to learn. It’s important to only collect and use data that, while diverse, is also extremely specific to the problem statement. This enables ML models to be trained to work in multiple real-world scenarios while also reducing the chance of errors and bias.

2. A reliable and proven annotation process

Once data has been initially collected and processed, the next and arguably most important step is annotation. Image data is annotated through a process known as data labeling and there are multiple ways to do this. It’s important to think carefully about which approach to take: The right image annotation approach will help keep costs down while ensuring accuracy.

3. Thorough quality assurance checks

Quality assurance (QA) and validation checks are critical for ensuring that data has been annotated correctly. This is especially true where images have been annotated via crowdsourcing.

We’ve written previously about our QA and validation features and methods such as dynamic judgments, confidence and agreement levels, and adaptive sampling; QA is a critical component of all of our workflows. Another tool in our QA belt is our sub-labels feature, and these are used to achieve even higher quality outcomes.

Tasq’s sub-labels feature

At the most basic level, image annotation validation is carried out by asking users a binary question.

An image might contain a dog, for example, and as part of the QA process a human might be asked, “Does this image contain a correctly annotated image of a dog?” with the option of answering yes or no.

Alternatively, they might be presented with the statement, “This image contains a correctly annotated image of a dog” and asked whether they agree or disagree.

When we annotate images with the crowd, however, we prefer to introduce more detailed labels in the form of different types of mistakes or rejections to achieve a higher quality result. This is our sub-labels feature, and it helps us to better understand the rationales of our data annotators and improve training.

The sub-labels feature in action

The sub-labels feature works by translating all the different types of negative responses available (“no”, “disagree”, “incorrect label”, “wrong class”, “unmarked object”, “inaccurate bounding box”, etcetera) into a single “No” answer and treats it as such for processes further down the line such as adaptive sampling and dynamic judgments.

Let’s return to the above example of an image containing a dog.

When presented with the question “Does this image contain a correctly annotated image of a dog?”, our annotators might have the following options:

  • Yes
  • No: Inaccurate bounding box
  • No: Wrong class
  • No: Incorrect label

Let’s say five people vote on the image and the results are as follows:

  • Yes (2 votes)
  • No: Inaccurate bounding box (1 vote)
  • No: Incorrect label (1 vote)

There are two votes for Yes and one vote for each other option.

Without our sub-labels feature, the merged result would be Yes because two beats one. This is clearly problematic because while two people have said the image is annotated correctly, three have highlighted what they perceive to be different errors meaning that the image might not be correctly annotated.

With our sub-labels feature, however, all the different negative votes are aggregated into a single No vote. In this case, it would be correctly identified that two people voted Yes while three people voted for a negative answer and thus there are three No votes.

The result of this is clear—higher accuracy which translates to more robust predictive ML models.

Rising to the challenge

Almost all ML models work on the assumption that the data they have been provided with is completely accurate. We don’t live in a perfect world, though; nothing can ever be accurate 100% of the time.

Inaccuracies in image annotation often result in ML models that can’t perform at their optimum, bringing down the overall predictive accuracy of an AI application. Data labeling tasks such as image annotation are therefore one of today’s biggest challenges.

If you would like to find out more about how Tasq’s features could take your image annotation workflow to the next level, why not sign up for a 30-minute free demo?