
Training data is essential in developing an effective Machine Learning (ML) model. This method of carefully constructing datasets to improve model precision is known as Data-centric AI. A huge public dataset is often an excellent place to start.
Because there is an ocean of incredible and accessible public datasets, we recommend only the top datasets for the most frequent ML challenges and tasks.
This list consists of object detection datasets that are among the most prevalent job types in computer vision, with applications ranging from retail to the automobile industry.
1. COCO
Microsoft’s COCO is a massive dataset that displays common objects in their native context.
The dataset consists of 1.5M items in over 330,000 photos and was first released in 2015. The objects are shown in realistic, often intricate, surroundings. Over two hundred thousand photos are completely annotated.
COCO includes several forms of annotations, including bounding boxes, human keypoints, and panoptic segmentation. Due to its extensive size and annotations, it is the go-to dataset for competitions and benchmarks.
Basic characteristics:
1.) Size: around 330,000 pictures and 1,500,000 objects labeled in a little over 90 sections
2.) Includes annotations
3.) Type of license: Creative Commons version 4.0
2. PASCAL VOC
Since 2005, PASCAL VOC has had standardized picture datasets for object class identification that has also been a component of competitions (see leaderboards).
The most current edition of the challenge, published in 2012, has 20 object categories including animals, automobiles, and household items. Each picture contains a bounding box, labels for the object class, and pixel-by-pixel semantic segmentation annotations.
Basic characteristics:
1.) Size: more than 1,400 pictures for both training and validation purposes
2.) Includes annotations
3.) Type of license: Custom
3. ImageNet
It’s one of the most well-known publicly available datasets for visual object detection. Work on ImageNet began in 2007, using WordNet as a foundation.
The collection has over 14 million carefully classified photos in over 20,000 categories, making it one of the broadest taxonomies among computer vision datasets.
This dataset gained notoriety in 2012 when AlexNet scored a top-five error rate of 16% in ILSVRC. This 11% improvement over the nearest rival made news, making it evident that neural networks would soon surpass human vision in object identification tests.
ImageNet continues to be a standard for researchers and practitioners in visual object recognition.
Basic characteristics:
1.) Size: over 14,000,000 pictures labeled in 20,000 sections
2.) Includes annotations
3.) Type of license: Custom
4. Visual Genome
It’s a huge picture dataset that contains over 100,000 annotated images and serves as a common benchmark for item recognition and environment characterization.
Visual Genome is meant to answer questions and describe the connections among all items. The ground truth annotations contain over 1.7 million question-answer pairs, or an average of 17 questions (Who, What, Where, When, Why, and How) per picture.
It maps all relationships, characteristics, and information to Wordnet Synsets. Without a doubt, it is one of the most complete massive datasets for self-driving cars created by the Motional team.
Basic characteristics:
1.) Size: over 108,000 and 3,800,000 instances of objects
2.) Includes annotations
3.) Type of license: Creative Commons version 4.0
5. DOTA v2.0
DOTA is a widely-used dataset for detecting objects in aerial photographs, compiled from a range of sources, sensors, and platforms.
The resolution of the photographs ranges from 800 to 200,000 pixels, and they feature items of various sizes, forms, and varieties. This dataset is frequently updated and is anticipated to develop further.
The ground truth annotations for aerial imagery are completed by experienced annotators in 18 categories. Using the OBB style, each annotation contains a difficulty score that describes how challenging it would be to find the item in question.
Frequently utilized in computer vision competitions such as LUAI 2021, this dataset is divided into train, validation, and testing.
Basic characteristics:
1.) Size: over 11,000 pictures and 1,800,000 instances of objects
2.) Includes annotations
3.) Type of license: Custom
6. Davis 2017
DAVIS (Densely Annotated VIdeo Segmentation) consists of 150 training, assessment, and testing films. It is a baseline database for object segmentation in movies and has been used in several competitions.
The dataset consists of 150 short sequences with around 13,000 distinct frames, which are separated into training, validation, and testing categories. There are supervised, semi-supervised, and unsupervised challenge assessments available.
Basic characteristics:
1.) Size: 150 shots
2.) Includes annotations
3.) Type of license: Unknown
7. BDD100k
BDD100K (Berkeley Deep Drive) is a prominent dataset for autonomous driving. It features 100,000 films divided into training, validation, and test sets. There are further variants with picture subsets from the videos.
This dataset contains ground truth identifiers for all commonly occurring objects in JSON format, as well as lane markers, and two types of segmentations: instance and pixel-wise semantic. It also contains information like the current time and climate. Additionally, there is a repository of over 300 pre-trained models that may be examined in the BDD Model Zoo.
BDD100K’s compactness and numerous capabilities have made it the go-to solution for multitasking and computer vision problems to this day.
Basic characteristics:
1.) Size: around 100,000 videos
2.) Includes annotations
3.) Type of license: BSD 3-Clause
8. nuScenes
nuScenes is one of the most complete large-scale datasets for autonomous driving, created by the Motional team.
While most existing datasets in the autonomous driving arena are solely focused on camera-based perception, nuScenes attempts to encompass the complete range of sensors with a larger data volume.
Although created by a private corporation, the software may be used in any non-commercial environment without restriction. Motional provides licenses for use in business settings.
Basic characteristics:
1.) Size: 1,000 driving situations with 1,400,000 pictures; 1,400,000 million bounding boxes; and 390,000 lidar sweeps
2.) Includes annotations
3.) Type of license: Custom
9. KITTI
This one is well-known in autonomous driving and computer vision. KITTI is somewhat of a pioneer in autonomous driving.
The collection includes GPS/IMU measurements, 100,000 points per frame from a lidar, and calibration data. It may be used for several autonomous driving activities. Detection benchmarks each include 7,500 training and 7,500 test photos.
Basic characteristics:
1.) Size: 100,000 photos captured throughout numerous hours of travel and 7,500 in the benchmark for object detection
2.) Includes annotations
3.) Type of license: NonCommercial-ShareAlike version 3.0
10. SUN RGB-D
Released in 2015, this dataset comprises over 10,000 hand-labeled photos divided evenly between training and assessment. The photographs were captured inside and depict commonplace things seen in homes and businesses. SUN-RGB is a typical dataset used as a benchmark for object identification.
The photos are completely annotated with 700 unique item classes, including semantic segmentation, room layout, and 2D and 3D bounding boxes. There are object detection issues in both 2D and 3D.
Basic characteristics:
1.) Size: over 10,335 pictures with 700 object categories
2.) Includes annotations
3.) Type of license: Unknown