Validation in the workflow blog post_RH


There are many different reasons why your business or organization would require your data to be labeled. You may be instructing a machine learning algorithm on a certain set of data points, or you may want to scrape and extract as much information as possible from your current dataset.

While a machine needs data inputs in order to learn and improve the models, it also needs a further step, notably, validation to assess its ability to provide a high-quality output and to further build on that output, learn and instruct the algorithm to continuously improve, and optimize both the accuracy and the precision of the model.

In this article, we will discuss how’s unique data annotation platform can assist your organization with validation tasks of this nature.

Validation in the workflow

What is model validation?

Let’s say your algorithm is surveying images of traffic cameras, and you want to know where the traffic lights, vehicles, and electricity poles are situated.

After first running your dataset on’s crowd platform, you will have the necessary inputs to train your model. But how do you then go on to evaluate how your model performs on a new dataset?

This is where model validation jobs in the platform step in. Instead of providing us with a blank canvas of images to annotate, you will provide us with all the data that your engines have already annotated, and we will then go on and do the validation for you.

Whereas in the first run, the crowd would have identified the required objects in the image and assigned further labels for their classification (vehicle, traffic light, traffic sign), this time around, we would show the crowd the labels that your engine has provided and ask them the question in a counteractive way. Whereby you would provide us with the bounding box/location of the intended person and the label the machine provided.

“Is the highlighted object a traffic light?”

In such a way, we can use the crowd at scale to validate all the labels your algorithm has provided.

Using a model of multiple judgments per annotation, you will then receive a consensus of high confidence on whether or not the labels your algorithm provided are correct, and following this, we can calculate the score of the algorithm’s ability to label the given data.

What about missing objects?

Validating labels on a given object is straightforward, but what about missing labels? Let’s say you have an image with 10 objects to classify, and your algorithm correctly identified 8 and labeled 4 of them. So, an initial analysis of the model would be 4 out of 8 or 50% precision on the labeled data.

But what about the recall? In actuality, you have only labeled 4 out of 10, or 40% of the objects correctly, but the only way to know this is to have a human eye review the objects and spot what is missing.

This can also be done in’s platform in a quick and highly scalable way. We can show the crowd your images, including all of the annotations that you have provided, and ask the follow-up question,

Are there any other people to spot in this image?

If the image contains 10s of objects or is very high resolution, we can also split it into multiple images and collect a number of judgments per part of the image to get a true picture of how many objects are in the image overall.

After this, we can calculate, with very high accuracy, what the precision and recall of your model are, and we are able to do this at scale. Each time you run your model, we can validate that iteration on the crowd, and you will be able to receive results with a quick turnaround and high quality guaranteed.

QC Threshold

In conclusion is an all-in-one solution with the ability to help you to build your models by annotating the images from scratch using your given labels, and then further down the line, we are able to run our validation workflows in order to validate the labels which your model outputs and provide you with a score which you can use to build on and improve.

validation workflows