“Finding a Tiny Pest in a Big Field”
It's all about the numbers
Problem: Finding the Right Data Labeling Partner
This Agrotech company had millions of aerial images and needed to identify tiny pests in a huge field. Their internal Data Science teams found this extremely difficult since only 2% of their data had actual pests in the images. They were spending hundreds of hours reviewing data that ended up not having a single pest in the image. This was not an efficient way to utilize their Data Scientists’ time and resources and they knew they needed to make a change.
Solution: Tasq.ai Grid Technology for the Win - Cleaning Data at Rapid Speed
When this Agrotech company saw Tasq.ai’s process of deconstructing large datasets into millions of micro-tasks they knew this was their solution. Tasqers at the peak of the job were labeling millions of images a day. This increased their data labeling speed to be 30x faster but at a level of precise accuracy that they couldn’t achieve from other leading labeling solutions. By breaking down the dataset into tiny parts of a much larger image, our Tasqers were able to identify if there was a pest present and if so what kind. This allowed our customer to be able to launch multiple models instead of one; identifying the specific type of pests and whether a pest was present. This also accelerated their roadmap by 9 months resulting in a more immediate and accurate way of identifying the problem and resolving it substantially faster. See below for the workflow that the Tasq.ai team implemented.
The Workflow built by our Customer Success team was as follows:
- Break down every image into a grid consisting of hundreds of smaller images focusing on a zoomed area at a time.
- Build a task for spotting, identifying and marking every object with multiple dynamic judgements for ensuring confidence scores.
- Build a small initial ‘Gold Dataset’ that consists of all of the types of pests in order to train the Tasqers against and to check their level of accuracy over time.
- Enlarge the ‘Gold Dataset’ dynamically once starting to receive positive annotations and by that enable the growth in the number of Tasqers that can enter the job.
- Gamify the tasks to create interest and avoid ‘boredom’.
- Limiting each Tasqer only to a few minutes of work in order to avoid burnout and exhaustion.
- Automatically labeling of bounding boxes around every marked object using ML for high confidence.
- Human classification of every object per the catalogue, with multiple judgements to ensure high quality and confidence results.
- Aggregation of all results and confidence scores per annotation to one single structured image.
Millions of images were scanned through at ultra speed by hundreds of thousands of Tasqers to create a high quality dataset, improving the client’s model faster than expected. The client created a superior capability vs competitors, leading to immediate success.