
The computer vision approach of object detection seeks out and recognizes certain items in a given image or video. It plays a significant role in numerous machine learning (ML) applications, including driverless vehicles, security systems, and augmented reality. Object detection seeks to identify and categorize visual items like their positions within a given image or video.
Source: Object detection algorithm
The ability of algorithms to automatically detect and classify things is a major goal in machine learning. As a result, ML algorithms can be trained more quickly and effectively without requiring manual annotation. Object detection has multiple applications, including tracking moving objects, analyzing scenes, and labeling pictures.
ML researchers can only progress with access to object detection datasets, which offer the training data that algorithms may learn and generate predictions. The efficiency and reliability of object detection algorithms depend on the quantity and variety of available datasets.
Large volumes of data are required to effectively train ML models. Datasets teach the machines the patterns in objects and their variations, improving the system’s ability to detect objects.
This article provides a comprehensive list of 2023’s top 15 public datasets for object detection. The datasets below vary in size, resolution, and types that meet a wide range of object detection requirements. The chosen datasets complement common object detection tasks and the large and diverse community that developed them.
COCO Dataset
Microsoft generated COCO (Common Objects in Context), a large-scale picture recognition dataset. It is widely used in computer vision and object identification research and is regarded as one of the best object detection datasets.
The MS COCO dataset evaluates object detection and picture recognition systems. The images in the dataset have bounding boxes around things tagged on them, providing a thorough training set for object detection algorithms. In addition, the dataset contains instance segmentation masks, which provide information about the shape of objects in the image.
Feature | Value |
---|---|
Total Images | 330,000 |
Total Classes | 91 |
Object Instances | 1.5 million |
Resolution | Up to 640 x 480 |
Types of Classes | Animals, vehicles, furniture, household items, etc. |
Evaluation Metrics | Average Precision, Recall, and F1-Score |
Pascal VOC
The Pascal Visual Object Classes (VOC) dataset is a benchmark for object detection and classification in computer vision. It was created by the Visual Object Classes (VOC) project at the University of Oxford and has become a standard dataset for evaluating object detection algorithms. It includes images of these objects in various poses and backgrounds, making it a diverse and challenging dataset for object detection algorithms.
Feature | Value |
---|---|
Images | 20,000 |
Classes | 20 |
Object Instances | 27,000+ |
Resolution | 500×375 pixels |
Types of Classes | Aeroplane, Bicycle, Bird, Boat, Bottle, Bus, Car, Cat, Chair, Cow, Dining Table, Dog, Horse, Motorbike, Person, Potted Plant, Sheep, Sofa, Train, TV Monitor |
Evaluation Metrics | Mean Average Precision (mAP), Average Number of Correct Detections (ANCD) |
ImageNet
ImageNet is a massive collection of labeled pictures that has become a key benchmark in computer vision and machine learning. Originally made available in 2009, the dataset now serves as a common benchmark for object identification and image classification tasks.
Feature | Value |
---|---|
Total Images | 14,197,122 |
Total Classes | 1000 |
Object Instances | >14 million |
Resolution | 256 x 256 pixels |
Types of Classes | Common objects, abstract concepts |
Evaluation Metrics | Top-1 accuracy, Top-5 accuracy |
CIFAR 100
The CIFAR-100 image recognition dataset is widely used in machine learning research. It has 100 classes with 600 images each, for a total of 60,000. Fine-grained classes include animals, automobiles, and daily objects, whereas coarse-grained classes include birds and mammals.
CIFAR-100 is a hard dataset because of its tiny picture size and a large number of classes, making it an excellent test for object recognition systems.
Feature | Value |
---|---|
Total Images | 60000 |
Total Classes | 100 |
Object Instances | – |
Resolution | 32×32 |
Types of Classes | Fine-grained (100) and Coarse-grained (20) |
Evaluation Metrics | Top-1 and Top-5 accuracy |
Open Images V6
Open Images V6 is a publicly available dataset that may be used for object recognition, segmentation, and detection. Released in February 2020, the dataset includes annotated images with labels and pixel segmentation. The images in the dataset were gathered from various sources, including Flickr, Wikipedia, and the Open Images website.
The bounding boxes on the images in the dataset indicate the position and size of objects inside the image. The collection also contains visual connection annotations, which show the associations between things in the image, such as “person riding a horse” or “dog playing with a ball.”
Feature | Value |
---|---|
Total Images | 1,743,042 images (training), 41,620 images (validation), and 125,436 images (test). |
Total Classes | 600 |
Object Instances | 600,000 |
Resolution | Varies (256×256 to 2048×2048) |
Types of Classes | Animals, vehicles, furniture, kitchen appliances, sporting equipment, etc. |
Evaluation Metrics | mean Average Precision (mAP) and mean Precision @ Overlap (mP@O) |
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute)
KITTI is one of the most popular mobile robotics and autonomous driving datasets. It consists of hours of traffic scenarios recorded with various sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner.
The Karlsruhe Institute of Technology and the Toyota Technological Institute developed this dataset that includes images with diverse contexts such as urban and rural locations, highways, and crossroads.
Feature | Value |
---|---|
Total Images | 7,481 training images and 7,518 test images |
Total Classes | 8 |
Object Instances | 80,256 |
Resolution | 1248×384 |
Types of Classes | Car, Van, Truck, Pedestrian, Person Sitting, Cyclist, Trams, Miscellaneous |
Evaluation Metrics | Average Precision (AP), Precision/Recall curve, Detection rate |
The Labelbox Open Images Dataset
Open Images by Labelbox is a large-scale image recognition dataset. OpenAI developed it in partnership with Labelbox, a data annotation platform. The images in the dataset are from various sources that vary in quality, object scale, and scene complexity.
Feature | Value |
---|---|
Total Images | 9 million |
Total Classes | 6000 |
Object Instances | 600,000 |
Resolution | Various (1024×1024 Average) |
Types of Classes | People, animals, vehicles, everyday objects |
Evaluation Metrics | Mean Average Precision (mAP) |
The GluonCV Object Detection Dataset
The GluonCV Object Detection dataset is a large-scale object detection dataset generated by GluonCV, an open-source computer vision library based on the MXNet deep learning architecture. This dataset offers practitioners and academics a massive, high-quality dataset for training and assessing object detection algorithms.
Feature | Value |
---|---|
Total Images | 160,000 |
Total Classes | 80 |
Object Instances | Over 1 million |
Resolution | 512×512 pixels |
Types of Classes | Animals, vehicles, common objects (chairs, tables, bottles, etc.) |
Evaluation Metrics | Mean Average Precision (mAP), Average Precision (AP) at various Intersection over Union (IoU) thresholds |
DOTA (Dataset for Object Detection in Aerial Images)
DOTA (Dataset for Object Detection in Aerial Photos) is a large-scale dataset of aerial images for object detection. The dataset contained a variety of objects in varied orientations and sizes and was created to improve object recognition algorithms for aerial images.
Categories in the DOTA dataset are designated with precise polygonal bounding boxes, which makes them helpful for classification and detection applications.
Feature | Value |
---|---|
Total Images | 2,806 |
Total Classes | 15 |
Object Instances | 188,282 |
Resolution | 3,000 x 3,000 pixels |
Types of Classes | Harbor, bridge, ground track field, small vehicle, large vehicle, ship, storage tank, helicopter, airplane, basketball court, playground, swimming pool, stadium, tennis court, helicopter landing pad |
Evaluation Metrics | Mean Average Precision (mAP) and the F1-score |
The ObjectNet3D Dataset
The ObjectNet3D benchmark dataset is a large-scale 3D object recognition and detection dataset. The collection comprises photos of 3D objects collected from the top, bottom, front, and rear perspectives. The ObjectNet3D collection is intended to include various objects and settings, such as ordinary home goods, furniture, electronics, and tools.
Dataset images were compiled in real-world circumstances, serving as an excellent dataset for testing object detection algorithms in real-world applications.
Parameter | Value |
---|---|
Total Images | 100,000+ |
Total Classes | 200 |
Object Instances | 4,963,853 |
Resolution | Various |
Types of Classes | Household items, Furniture, Electronics, Tools |
Evaluation Metrics | Average Precision (AP), Mean Average Precision (mAP) |
Udacity Self Driving Car Dataset
The original Udacity Self Driving Car Dataset had a significant number of missing labels for pedestrians, bikers, cars, and traffic lights, leading to potential inaccuracies and poor model performance. This could even have dangerous consequences when applied to self-driving car technology.
The dataset, however, is available for download in VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and other formats, providing greater convenience and accessibility.
Feature | Value |
---|---|
Total Images | 15,000 |
Total Classes | 11 |
Object Instances | 97,942 |
Resolution | 1920×1200 |
Types of Classes | Car, traffic-light, truck, biker, pedestrian |
Evaluation Metrics | Precision, Recall, F1 score |
BDD100K Dataset
BDD100K is a massive video and image dataset used in computer vision development and research. It was developed by UC Berkeley and Baidu Research and made public in 2017.
Feature | Value |
---|---|
Total Images | 100,000 |
Total Classes | 10 |
Object Instances | 1.8M |
Resolution | 1280×720 |
Types of Classes | Car, truck, bus, motorcycle, pedestrian, bicycle, traffic light, traffic sign, a person riding a bicycle or motorcycle |
Evaluation Metrics | Mean Average Precision |
Visual Genome Dataset
The Visual Genome dataset is a large-scale dataset for research in computer vision. Produced by Stanford University, it comprises a large collection of images and labels for object recognition and scene interpretation.
Feature | Value |
---|---|
Total Images | 108,000 |
Total Classes | 1,000+ |
Object Instances | 3.8 million |
Resolution | Varies, mainly 500×500 to 1000×1000 pixels |
Types of Classes | People, animals, vehicles, furniture, and more |
Evaluation Metrics | Object recognition, scene understanding |
nuScenes Dataset
nuScenes is a public dataset for autonomous vehicle perception developed by the autonomous vehicle technology company, nuTonomy (now owned by Aptiv). The dataset contains a wide range of data captured from real-world autonomous vehicles, including high-resolution LIDAR and camera data and corresponding annotation. The nuScenes dataset contains 1000 scenes, each lasting 20 seconds and captured at 20Hz.
Feature | Value |
---|---|
Total Images | 45K |
Total Classes | 10 |
Object Instances | 1.4M+ |
Resolution | 1920 x 1080 pixels |
Types of Classes | cars, trucks, buses, trailers, construction vehicles, pedestrians, motorcycles, bicycles, traffic cones, and barriers. |
Evaluation Metrics | mAP (object detection), IoU (semantic segmentation and depth estimation) |
20BN-SOMETHING-SOMETHING V2 dataset
The 20BN-SOMETHING-SOMETHING V2 dataset is a large-scale video recognition dataset that also contains a wide range of objects and actions. It is a follow-up version of the original 20BN-SOMETHING-SOMETHING dataset and was created to improve the quality and diversity of the data.
Feature | Value |
---|---|
Total Videos | 220,847 videos, with 168,913 in the training set and 24,777 in the validation set |
Total Classes | 174 |
Object Instances | 1.2 million+ |
Resolution | 100×100 pixels |
Types of Classes | Everyday objects and actions |
Evaluation Metrics | Accuracy, F1 Score, Precision, Recall |
Key Takeaways
- Many machine learning applications heavily depend on object detection, which needs a lot of labeled data to be trained in deep learning models.
- Datasets for object detection give deep learning algorithms the training data they need to recognize and predict things in photos and videos.
- The accuracy and resilience of object detection algorithms are significantly influenced by the quality and diversity of the datasets used for the task.
- Researchers may train algorithms that can effectively detect and categorize objects in various circumstances by employing diverse and high-quality datasets, making them more useful in real-world applications.
- It is crucial to consider attributes like the dataset’s size and diversity, the correctness of the annotations, and the availability of pre-trained models when choosing a dataset for object detection.
Conclusion
With the support of advanced research and large public datasets, the domain of object detection has made remarkable development in recent years. These datasets have been critical in assisting academics and developers in training, evaluating, and improving their object detection algorithms. With continuous research and development, it is projected that the quantity and diversity of publicly available datasets for object detection will only increase, making it simpler for everyone to collaborate on novel solutions to real-world issues. So, whether you are a researcher, developer, or simply a curious enthusiast, keep looking for new and quality public datasets to build state-of-the-art object detection models.