Pascal VOC

What is Pascal VOC

The PASCAL VOC (Visual Object Classes) challenge has two parts:

  • a publicly accessible collection of pictures and annotations, as well as standardized assessment tools
  • a yearly competition and workshop.

The VOC2007 dataset is made up of annotated consumer photos from the flickr2 photo-sharing website. Since 2006, a fresh dataset with ground truth annotation has been provided every year.

There are two major obstacles to overcome:

  1. Is there anything in the image that belongs to a certain object class? where the object classes include automobiles, people, pets, and so on.
  2. Where the instances of a given object class in the image are detected?

There are also two “tasters” on pixel-level segmentation – assigning a class label to each pixel – and person layout – locating the feet, hands, and head of humans in the image. Every year, deadlines are set for the challenges, and a workshop is conducted to compare and analyze the previous year’s findings and techniques. Following that, the datasets, as well as the related annotation and software, are released and made accessible for usage at any time.

Future of Pascal VOC

A substantial improvement is being made in the field of object class recognition, and the requirements for a benchmark are evolving at a rapid pace. Here is a list of elements that might be improved or added to future VOC challenges:

Object classes – the number of annotated object classes is the first and most noticeable expansion. One of the key goals here is to place a greater emphasis on scalability — running as many detectors as there are object classes may not be a feasible strategy in the future, despite the fact that this is currently the most popular option. More classes would also encourage study into discriminating between visually similar classes and leveraging semantic relationships between classes.

However, increasing the number of classes will make the VOC challenge more difficult to run:

  • it will be more difficult to gather enough data per class;
  • it will bring concerns about how to annotate items appropriately;
  • recognition assessment must be more flexible.

Object parts

Annotation of body parts was added at VOC2007 in order to assess and stimulate the development of techniques capable of more thorough picture annotation than only object placement. This type of more precise representation of object pieces is an essential path to explore. Despite the fact that many strategies nowadays start with local characteristics, these features often have nothing to do with the semantic aspects of the objects. Object detection and recognition, on the other hand, are frequently used to facilitate interaction with things. To make such effective use of object identification, it is generally necessary to have a thorough understanding of where components are located, and this should be included in at least a component of the assessment system.

VOC has so far been limited to object classes and annotations that allow discrete items to be recognized. With the inclusion of the segmentation taster, it’s only reasonable to include stuff classes and consider annotation of classes that may seem like things in the distance — photos with such ambiguities are now excluded from the VOC collection.


The VOC competition has so far only focused on identifying and recognizing items in still photos. The task might be expanded in various ways if video clips were included:

  • As training data, it would aid in the development of more complex object models. Video of objects with various viewing directions would give implicitly available relations between components through tracking;
  • As test data, it would enable the assessment of novel tasks such as object recognition from video and action recognition. This would also put the VOC challenge on par with other benchmarks, such as an interactive search task with a greater emphasis on actions.


Object class identification has come a long way in the last decade. Few would have predicted that by the turn of the millennium, the community would have achieved such excellent results in both classification and detection for such a wide range of object types. This breakthrough has coincided with the creation of picture databases, which have given both the training data and the testing data needed to measure performance gains. The VOC challenge has been crucial in our effort, and we hope it will continue to be so.