What is Computer Vision? Brief Overview & Applications

A brief introduction to computer vision

If we were to ask you to name the objects in the picture below, you would probably come up with words like, “car”, “van”, “streetlight”, “sign” without having to think too hard. This is a very simple task that anyone with basic intellect can do.

Behind the scenes, however, a very complicated neural process takes place. Human vision is an intricate system that involves our eyes but also accounts for our mind, our perception, our understanding of concepts, and our personal experiences through the billions of interactions we’ve made with the physical world during our lives.

While modern digital equipment can capture images like this with a level of detail that exceeds our own visual capabilities, making sense of what’s in those images is something that computers don’t naturally excel at. To a standard computer, the image above is nothing more than an amalgamation of pixels with differing color intensities. That’s it.

But there’s a field of computer science that has been working hard for decades to enable machines to read and interpret images in the same way we do—computer vision.

Seeing through the eyes of a machine

While this may sound straightforward, it’s anything but.

Computer vision is one of the most challenging fields in computer science, not only because of the dynamic complexity of our physical world but also because human eyesight is based on billions of years of human evolution.

If we’re able to perfect it, though, its value to our world will be immeasurable. This is perhaps best illustrated by the fact that the global computer vision market size is expected to grow at a compound annual growth rate of 45.64% between 2021 and 2028, culminating in a total value of US$144.46 billion as research efforts intensify.

What is computer vision?

Computer vision is a field of artificial intelligence (AI) that focuses on replicating parts of the human visual system and enabling machines such as computers, mobile devices, and cameras to “see” their surroundings and derive useful information from them.

The goal of computer vision is to imitate the human eye so that computers, with the help of hardware, algorithms, and data, can be trained to interpret images and perform a range of ever-increasingly complex tasks and functions in response. And thanks to recent advances in AI and deep learning, both research into and the physical application of computer vision is making astounding progress.

What are examples of computer vision applications?

The value of computer vision can be attributed to the problems it could solve. It’s the main technology that enables the digital world to interact with our physical world.

A classic example of a computer vision application is the self-driving car. While some models such as the Tesla Model X have self-driving capabilities, advances in computer vision could lead to fully autonomous “level 5” vehicles that require zero human interaction.

Computer vision also plays an important role in facial recognition, which enables computers to match images of people’s faces to their identities. It’s also the technology that powers augmented and virtual reality, and it has been critical in the development of modern health technology. Computer vision algorithms can help to automate tasks such as detecting fractures in X-rays and cancerous moles in images of skin.

There are several more nuanced applications, too. Smart home security, pest detection in farming, tracking objects through a sequence of images, image segmentation, retail inventory management, engagement monitoring in education, and more are all examples of areas where computer vision can add substantial value.

Common computer vision tasks

Applications like self-driving cars and computer-powered medical diagnostics are powered by a range of key computer vision tasks. These include:

Object detection

Object detection is the ability of a computer to detect, recognize, identify, and correctly classify the position of objects in an image. The position of an object is typically captured using bounding boxes or other shapes such as polygons or ellipsoids.

Image classification

Image classification in computer vision is the classification of an image as a whole as opposed to its individual objects or components. An example of an image class might be “field” or “forest”.

Semantic segmentation

This is an advanced AI image annotation method that classifies defined regions within an object or area. Returning to the above image, you can see that all people in the image are surrounded by a yellow bounding box. This is semantic segmentation; the objects (people) all belong to the same class.

What’s next for computer vision?

It has taken the best part of 80 years for computer vision to get to where it is now. Recent breakthroughs in AI and deep learning are helping to further progress in the field at an astonishing rate.

Yet, there are still plenty of unknowns. Not only in terms of how we can unlock the full potential of computer vision but also in terms of our partial biological understanding of human vision.

While it’s hard to say what the future of computer vision will look like, what we do know is that our utilization of it as a technology is bound to grow. If the examples above—autonomous vehicles, AI-powered medical technology, and agricultural technology — are anything to go by, the future looks bright.

About Tasq.ai

At Tasq, we are helping AI companies develop better machine learning models and computer vision applications by providing them with a best-in-class data labeling solution. This allows them to take their computer vision products to the next level in terms of accuracy and quality.

Want to learn more about the Tasq.ai platform? Check out our blog!

Needle in a haystack: Using Tasq for large-scale labeling of tiny objects

Crop scouting is an important process in farming. It involves assessing pest pressure, typically insects, and crop performance to evaluate the potential risk from pest infestations, weeds, disease, and other observations. Regular crop scouting during the growing season helps farmers to make timely and informed decisions to protect their crop yields.

Historically, crop scouting would be carried out by human scouts. These people would be responsible for walking through vast crop fields and documenting their findings. This is a time-consuming and costly method that often results in late detection of disease and pests.

Thanks to advanced artificial intelligence (AI) agrotech solutions, however, this time-consuming and costly method has been replaced by many that are quicker, cheaper, and more accurate.

What is Agrotech?

Agrotech (agricultural technology) solutions help farmers scout their fields for problems, such as disease or pests. There are a range of different solutions available, and many turn low-cost commercial drones into digital crop scouts by using powerful AI-backed platforms.

In a 20-minute walk, a crop scout might be able to check 150 potato plants. In a 20-minute flyover, however, a robust agrotech solution could cover 10,000.

To improve the power and accuracy of their platform, an agrotech platform recently sought the help of the right data labeling platform: Tasq.ai.

The challenge: Cleaning up a massive amount of data

The agrotech provider had amassed a huge amount of aerial images of potato fields, and they needed to identify a tiny pest that’s endemic in this environment: the Colorado potato beetle.

This type of beetle becomes active in spring, around the same time as potato plants grow out of the ground. The beetles feed on the potato plant’s leaves and can completely defoliate the plants. Potato plants can usually withstand infections early in the season, so it’s important for farmers to act quickly.

Since only 2% of these images had actual pests in them, the company’s internal data scientists found it extremely difficult to label them. A huge amount of time was wasted reviewing data that didn’t have a single pest in the image as a result, time that could be better spent on other tasks.

The solution: Tasq.ai dynamic judgments

In tackling this problem, our primary challenge was cleaning up the data and detecting which ones contained beetles. We experimented with two distinct approaches:

Approach 1. Ask one group of users to put a dot on each beetle they see, then ask two other groups of users to mark them with bb and classify.

Approach 2. Simply ask the users, “Do you see a beetle in the image?” and then ask the users to mark beetles only on images where users indicated that they saw beetles.

In the end, we opted for the second approach.

Firstly, the obvious advantage of taking this approach was that a yes/no question is a much simpler task which led to a higher engagement rate, thus speeding up the project.

Secondly, it’s much easier to aggregate judgments by using the answers to a yes/no question as opposed to a graphic annotation.

Overall, the second approach improved the quality of the aggregated answer because it was easier to compare multiple judgments and conclude the final result. Combined with our dynamic judgments feature, where we stop collecting judgments when we hit a defined agreement level, we were able to cut the required judgments by 30%, making the whole process 30% faster and cheaper.

The process: Break down each data set and create micro-tasks

By breaking down each dataset into millions of micro-tasks, our data labelers were able to identify if there was a pest present in the image. Thanks to this process, the agrotech platform was able to vastly improve the accuracy of its pest detection ML model

At the peak of this engagement, our data labeling experts were using the Tasq platform to label a huge amount of images per day. This meant that the agrotech platform was able to increase its data labeling speed by over a factor of 30: Instead of having their own experts review tens of thousands of images, they were able to focus on the 2% of images where a pest was actually present.

The result: A superior ML model

Throughout this project, vast datasets were processed at lightning speed by hundreds of thousands of individual data labeling experts thanks to the power of the Tasq data labeling platform.

The result of this was a high-quality dataset and a superior ML model achieved much more quickly and at a lower cost than anything possible with other providers.

Using sub-labels for robust image annotation QA

The images that you use to train, validate, and test your machine learning (ML) algorithms have a significant impact on the success of your computer vision project.

As a result, image annotation has become a ubiquitous process in recent years and is necessary for almost any application that relies on artificial intelligence (AI) and ML.

Best practices for image annotation

For computer vision projects, every single image within every dataset must be thoroughly and accurately processed and annotated to sufficiently train a ML model to recognize the world similar to the way we humans can.

Due to how important it is to develop the most accurate and reliable ML models possible for any AI application—and, indeed, how much more important it is becoming with the passage of time—there are a few critical best practices to be mindful of. These include:

1. Robust dataset collection and processing

You can’t feed random datasets into a ML model and expect it to learn. It’s important to only collect and use data that, while diverse, is also extremely specific to the problem statement. This enables ML models to be trained to work in multiple real-world scenarios while also reducing the chance of errors and bias.

2. A reliable and proven annotation process

Once data has been initially collected and processed, the next and arguably most important step is annotation. Image data is annotated through a process known as data labeling and there are multiple ways to do this. It’s important to think carefully about which approach to take: The right image annotation approach will help keep costs down while ensuring accuracy.

3. Thorough quality assurance checks

Quality assurance (QA) and validation checks are critical for ensuring that data has been annotated correctly. This is especially true where images have been annotated via crowdsourcing.

We’ve written previously about our QA and validation features and methods such as dynamic judgments, confidence and agreement levels, and adaptive sampling; QA is a critical component of all of our workflows. Another tool in our QA belt is our sub-labels feature, and these are used to achieve even higher quality outcomes.

Tasq’s sub-labels feature

At the most basic level, image annotation validation is carried out by asking users a binary question.

An image might contain a dog, for example, and as part of the QA process a human might be asked, “Does this image contain a correctly annotated image of a dog?” with the option of answering yes or no.

Alternatively, they might be presented with the statement, “This image contains a correctly annotated image of a dog” and asked whether they agree or disagree.

When we annotate images with the crowd, however, we prefer to introduce more detailed labels in the form of different types of mistakes or rejections to achieve a higher quality result. This is our sub-labels feature, and it helps us to better understand the rationales of our data annotators and improve training.

The sub-labels feature in action

The sub-labels feature works by translating all the different types of negative responses available (“no”, “disagree”, “incorrect label”, “wrong class”, “unmarked object”, “inaccurate bounding box”, etcetera) into a single “No” answer and treats it as such for processes further down the line such as adaptive sampling and dynamic judgments.

Let’s return to the above example of an image containing a dog.

When presented with the question “Does this image contain a correctly annotated image of a dog?”, our annotators might have the following options:

  • Yes
  • No: Inaccurate bounding box
  • No: Wrong class
  • No: Incorrect label

Let’s say five people vote on the image and the results are as follows:

  • Yes (2 votes)
  • No: Inaccurate bounding box (1 vote)
  • No: Incorrect label (1 vote)

There are two votes for Yes and one vote for each other option.

Without our sub-labels feature, the merged result would be Yes because two beats one. This is clearly problematic because while two people have said the image is annotated correctly, three have highlighted what they perceive to be different errors meaning that the image might not be correctly annotated.

With our sub-labels feature, however, all the different negative votes are aggregated into a single No vote. In this case, it would be correctly identified that two people voted Yes while three people voted for a negative answer and thus there are three No votes.

The result of this is clear—higher accuracy which translates to more robust predictive ML models.

Rising to the challenge

Almost all ML models work on the assumption that the data they have been provided with is completely accurate. We don’t live in a perfect world, though; nothing can ever be accurate 100% of the time.

Inaccuracies in image annotation often result in ML models that can’t perform at their optimum, bringing down the overall predictive accuracy of an AI application. Data labeling tasks such as image annotation are therefore one of today’s biggest challenges.

If you would like to find out more about how Tasq’s features could take your image annotation workflow to the next level, why not sign up for a 30-minute free demo?