Top 15 Public Datasets for Object Detection in 2023

The computer vision approach of object detection seeks out and recognizes certain items in a given image or video. It plays a significant role in numerous machine learning (ML) applications, including driverless vehicles, security systems, and augmented reality. Object detection seeks to identify and categorize visual items like their positions within a given image or video.

Object Detection Algorithm

Source: Object detection algorithm

The ability of algorithms to automatically detect and classify things is a major goal in machine learning. As a result, ML algorithms can be trained more quickly and effectively without requiring manual annotation. Object detection has multiple applications, including tracking moving objects, analyzing scenes, and labeling pictures.

ML researchers can only progress with access to object detection datasets, which offer the training data that algorithms may learn and generate predictions. The efficiency and reliability of object detection algorithms depend on the quantity and variety of available datasets.

Large volumes of data are required to effectively train ML models. Datasets teach the machines the patterns in objects and their variations, improving the system’s ability to detect objects.

This article provides a comprehensive list of 2023’s top 15 public datasets for object detection. The datasets below vary in size, resolution, and types that meet a wide range of object detection requirements. The chosen datasets complement common object detection tasks and the large and diverse community that developed them.

COCO Dataset

Microsoft generated COCO (Common Objects in Context), a large-scale picture recognition dataset. It is widely used in computer vision and object identification research and is regarded as one of the best object detection datasets.

The MS COCO dataset evaluates object detection and picture recognition systems. The images in the dataset have bounding boxes around things tagged on them, providing a thorough training set for object detection algorithms. In addition, the dataset contains instance segmentation masks, which provide information about the shape of objects in the image.

Feature Value
Total Images 330,000
Total Classes 91
Object Instances 1.5 million
Resolution Up to 640 x 480
Types of Classes Animals, vehicles, furniture, household items, etc.
Evaluation Metrics Average Precision, Recall, and F1-Score


Pascal VOC

The Pascal Visual Object Classes (VOC) dataset is a benchmark for object detection and classification in computer vision. It was created by the Visual Object Classes (VOC) project at the University of Oxford and has become a standard dataset for evaluating object detection algorithms. It includes images of these objects in various poses and backgrounds, making it a diverse and challenging dataset for object detection algorithms.

Feature Value
Images 20,000
Classes 20
Object Instances 27,000+
Resolution 500×375 pixels
Types of Classes Aeroplane, Bicycle, Bird, Boat, Bottle, Bus, Car, Cat, Chair, Cow, Dining Table, Dog, Horse, Motorbike, Person, Potted Plant, Sheep, Sofa, Train, TV Monitor
Evaluation Metrics Mean Average Precision (mAP), Average Number of Correct Detections (ANCD)



ImageNet is a massive collection of labeled pictures that has become a key benchmark in computer vision and machine learning. Originally made available in 2009, the dataset now serves as a common benchmark for object identification and image classification tasks.

Feature Value
Total Images 14,197,122
Total Classes 1000
Object Instances >14 million
Resolution 256 x 256 pixels
Types of Classes Common objects, abstract concepts
Evaluation Metrics Top-1 accuracy, Top-5 accuracy



The CIFAR-100 image recognition dataset is widely used in machine learning research. It has 100 classes with 600 images each, for a total of 60,000. Fine-grained classes include animals, automobiles, and daily objects, whereas coarse-grained classes include birds and mammals.

CIFAR-100 is a hard dataset because of its tiny picture size and a large number of classes, making it an excellent test for object recognition systems.

Feature Value
Total Images 60000
Total Classes 100
Object Instances
Resolution 32×32
Types of Classes Fine-grained (100) and Coarse-grained (20)
Evaluation Metrics Top-1 and Top-5 accuracy


Open Images V6

Open Images V6 is a publicly available dataset that may be used for object recognition, segmentation, and detection. Released in February 2020, the dataset includes annotated images with labels and pixel segmentation. The images in the dataset were gathered from various sources, including Flickr, Wikipedia, and the Open Images website.

The bounding boxes on the images in the dataset indicate the position and size of objects inside the image. The collection also contains visual connection annotations, which show the associations between things in the image, such as “person riding a horse” or “dog playing with a ball.”

Feature Value
Total Images 1,743,042 images (training),  41,620 images (validation), and 125,436 images (test).
Total Classes 600
Object Instances 600,000
Resolution Varies (256×256 to 2048×2048)
Types of Classes Animals, vehicles, furniture, kitchen appliances, sporting equipment, etc.
Evaluation Metrics mean Average Precision (mAP) and mean Precision @ Overlap (mP@O)


KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute)

KITTI is one of the most popular mobile robotics and autonomous driving datasets. It consists of hours of traffic scenarios recorded with various sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner.

The Karlsruhe Institute of Technology and the Toyota Technological Institute developed this dataset that includes images with diverse contexts such as urban and rural locations, highways, and crossroads.

Feature Value
Total Images 7,481 training images and 7,518 test images
Total Classes 8
Object Instances 80,256
Resolution 1248×384
Types of Classes Car, Van, Truck, Pedestrian, Person Sitting, Cyclist, Trams, Miscellaneous
Evaluation Metrics Average Precision (AP), Precision/Recall curve, Detection rate


The Labelbox Open Images Dataset

Open Images by Labelbox is a large-scale image recognition dataset. OpenAI developed it in partnership with Labelbox, a data annotation platform. The images in the dataset are from various sources that vary in quality, object scale, and scene complexity.

Feature Value
Total Images 9 million
Total Classes 6000
Object Instances 600,000
Resolution Various (1024×1024 Average)
Types of Classes People, animals, vehicles, everyday objects
Evaluation Metrics Mean Average Precision (mAP)


The GluonCV Object Detection Dataset

The GluonCV Object Detection dataset is a large-scale object detection dataset generated by GluonCV, an open-source computer vision library based on the MXNet deep learning architecture. This dataset offers practitioners and academics a massive, high-quality dataset for training and assessing object detection algorithms.

Feature Value
Total Images 160,000
Total Classes 80
Object Instances Over 1 million
Resolution 512×512 pixels
Types of Classes Animals, vehicles, common objects (chairs, tables, bottles, etc.)
Evaluation Metrics Mean Average Precision (mAP), Average Precision (AP) at various Intersection over Union (IoU) thresholds


DOTA (Dataset for Object Detection in Aerial Images)

DOTA (Dataset for Object Detection in Aerial Photos) is a large-scale dataset of aerial images for object detection. The dataset contained a variety of objects in varied orientations and sizes and was created to improve object recognition algorithms for aerial images.

Categories in the DOTA dataset are designated with precise polygonal bounding boxes, which makes them helpful for classification and detection applications.

Feature Value
Total Images 2,806
Total Classes 15
Object Instances 188,282
Resolution 3,000 x 3,000 pixels
Types of Classes Harbor, bridge, ground track field, small vehicle, large vehicle, ship, storage tank, helicopter, airplane, basketball court, playground, swimming pool, stadium, tennis court, helicopter landing pad
Evaluation Metrics Mean Average Precision (mAP) and the F1-score


The ObjectNet3D Dataset

The ObjectNet3D benchmark dataset is a large-scale 3D object recognition and detection dataset. The collection comprises photos of 3D objects collected from the top, bottom, front, and rear perspectives. The ObjectNet3D collection is intended to include various objects and settings, such as ordinary home goods, furniture, electronics, and tools.

Dataset images were compiled in real-world circumstances, serving as an excellent dataset for testing object detection algorithms in real-world applications.

Parameter Value
Total Images 100,000+
Total Classes 200
Object Instances 4,963,853
Resolution Various
Types of Classes Household items, Furniture, Electronics, Tools
Evaluation Metrics Average Precision (AP), Mean Average Precision (mAP)


Udacity Self Driving Car Dataset

The original Udacity Self Driving Car Dataset had a significant number of missing labels for pedestrians, bikers, cars, and traffic lights, leading to potential inaccuracies and poor model performance. This could even have dangerous consequences when applied to self-driving car technology.

The dataset, however, is available for download in VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and other formats, providing greater convenience and accessibility.

Feature Value
Total Images 15,000
Total Classes 11
Object Instances 97,942
Resolution 1920×1200
Types of Classes Car, traffic-light, truck, biker, pedestrian
Evaluation Metrics Precision, Recall, F1 score


BDD100K Dataset

BDD100K is a massive video and image dataset used in computer vision development and research. It was developed by UC Berkeley and Baidu Research and made public in 2017.

Feature Value
Total Images 100,000
Total Classes 10
Object Instances 1.8M
Resolution 1280×720
Types of Classes Car, truck, bus, motorcycle, pedestrian, bicycle, traffic light, traffic sign, a person riding a bicycle or motorcycle
Evaluation Metrics Mean Average Precision


Visual Genome Dataset

The Visual Genome dataset is a large-scale dataset for research in computer vision. Produced by Stanford University, it comprises a large collection of images and labels for object recognition and scene interpretation.

Feature Value
Total Images 108,000
Total Classes 1,000+
Object Instances 3.8 million
Resolution Varies, mainly 500×500 to 1000×1000 pixels
Types of Classes People, animals, vehicles, furniture, and more
Evaluation Metrics Object recognition, scene understanding


nuScenes Dataset

nuScenes is a public dataset for autonomous vehicle perception developed by the autonomous vehicle technology company, nuTonomy (now owned by Aptiv). The dataset contains a wide range of data captured from real-world autonomous vehicles, including high-resolution LIDAR and camera data and corresponding annotation. The nuScenes dataset contains 1000 scenes, each lasting 20 seconds and captured at 20Hz.

Feature Value
Total Images 45K
Total Classes 10
Object Instances 1.4M+
Resolution 1920 x 1080 pixels
Types of Classes cars, trucks, buses, trailers, construction vehicles, pedestrians, motorcycles, bicycles, traffic cones, and barriers.
Evaluation Metrics mAP (object detection), IoU (semantic segmentation and depth estimation)



The 20BN-SOMETHING-SOMETHING V2 dataset is a large-scale video recognition dataset that also contains a wide range of objects and actions. It is a follow-up version of the original 20BN-SOMETHING-SOMETHING dataset and was created to improve the quality and diversity of the data.

Feature Value
Total Videos 220,847 videos, with 168,913 in the training set and 24,777 in the validation set
Total Classes 174
Object Instances 1.2 million+
Resolution 100×100 pixels
Types of Classes Everyday objects and actions
Evaluation Metrics Accuracy, F1 Score, Precision, Recall


Key Takeaways

  • Many machine learning applications heavily depend on object detection, which needs a lot of labeled data to be trained in deep learning models.
  • Datasets for object detection give deep learning algorithms the training data they need to recognize and predict things in photos and videos.
  • The accuracy and resilience of object detection algorithms are significantly influenced by the quality and diversity of the datasets used for the task.
  • Researchers may train algorithms that can effectively detect and categorize objects in various circumstances by employing diverse and high-quality datasets, making them more useful in real-world applications.
  • It is crucial to consider attributes like the dataset’s size and diversity, the correctness of the annotations, and the availability of pre-trained models when choosing a dataset for object detection.


With the support of advanced research and large public datasets, the domain of object detection has made remarkable development in recent years. These datasets have been critical in assisting academics and developers in training, evaluating, and improving their object detection algorithms. With continuous research and development, it is projected that the quantity and diversity of publicly available datasets for object detection will only increase, making it simpler for everyone to collaborate on novel solutions to real-world issues. So, whether you are a researcher, developer, or simply a curious enthusiast, keep looking for new and quality public datasets to build state-of-the-art object detection models.