When dealing with machine learning classification problems, there are three primary types of classification problems: binary, multiclass, and multilabel. In this blog post, we discuss how they differ, and deep dive into multilabel classification.
Classification is the process of accurately classifying previously undiscovered data. A training dataset is used to train a machine learning model which is then used to predict the class value for unseen data (records without a class value present). Each record in the training set has a set of attributes, one of which is the target class, and the unseen dataset is made up of the same attributes except the target class.
Three categories—binary classification, multiclass classification, and multilabel classification—can be used to categorize this task.
A classifier is developed by training it on the training dataset. The accuracy of that classifier is then evaluated using the test set. Once desired evaluation metrics are achieved for the test data, the model is ready to be deployed for unseen data.
Classifiers are only used to forecast the class value of unobserved data if accuracy is acceptable. Multi-label classification is supported natively by some of the machine learning algorithms. Depending on the particulars of the classification task, neural network models can be set up to support multi-label classification and can be effective. A misclassification in multi-label classification is no longer a clear-cut wrong or right. Predicting two of the actual classes is better than predicting none of them because it contains a smaller subset of the actual classes.
In single-label supervised learning algorithms, each example (instance) in the training set is associated with a single label describing its property. For the training set of multi-label learning algorithms, each example is associated with multiple labels at the same time, and the task is to predict the correct label set for unseen examples.
Multi-label classification (MLC) models have demonstrated significant promise in a wide range of applications including text categorization, image classification, automatic image annotation, web mining, rule mining, information retrieval, and tag recommendation among many others.
Multi-Label Classification Techniques
Numerous methods for dealing with multi-label learning problems have been proposed. There are three types of existing algorithms: problem transformation approaches, problem adaptation algorithms, and ensemble methods. The first classification divides the multi-label problem into one or more conventional single-label problems. The second generalises single-label algorithms to directly deal with multi-labeled data. The third combines the advantages of the two preceding approaches.
In this blog, we focus on Problem Transformation.
The problem transformation approach entails converting an input instance into a representation that can be used by a traditional single-label classifier. The multi-label data representation is transformed into a single-label data representation that is acceptable by traditional SL classification methods in this approach. This approach includes algorithms such as Binary Relevance (BR), Ranking by Pairwise Comparison (RPC), Label Powerset (LP), Classifier Chains (CC), and others. The algorithm adaptation approach involves modifying an existing SL classifier algorithm to handle multi-label instances.
The problem transformation methods take the original problem and convert it into one or more single-label classification or regression problems. The algorithm adaptation methods do not transform the problem; they modify the learning algorithms to handle multi-label data. Many base classifier algorithms such as J48, NB, SMO, AdaboostM1, ZeroR, and Bagging, are used in problem transformation methods.
Binary relevance methods convert a multi-label dataset into multiple single-label binary datasets. One technique under binary relevance is One-vs-All (BR-OvA).
One-vs-all (OVA) methods are one of the most popular multi-label classification strategies, in which a binary classifier is trained independently for each label. It transforms the dataset with k labels into k single-label datasets and fits a binary classifier for each label.
Another technique is One-vs-One. BR-OvO converts a multi-label dataset into several binary datasets, where each contains two different labels.
The label powerset method converts a multi-label dataset to a single multi-class dataset by considering each label combination as a unique class. It achieves multi-label classification by assigning an instance to a class that consists of a set of labels. In our example, we give each unique label set a class (C001, C110, C011, C101). Then, a multi-class classifier is trained to assign an instance to one of the above classes.
Unlike problem transformation methods, problem adaptation (or algorithm adaptation) methods address the multi-label learning problem directly by adapting some existing learning algorithms to the multi-label learning scenario.
This method includes modifying the training and prediction phases of single target methods to handle multiple labels at the same time. For example, trees alter the heuristic used to create splits, and Support Vector Machines (SVMs) use an additional threshold technique. The adaptations provide a mechanism for directly dealing with the label dependency. There are five types of problem adaptation methods: SVM, trees, neural networks, instance-based, and probabilistic.
Some algorithm adaptation methods are the Multi-Label k-Nearest Neighbors (MLKNN), the BackPropagation Multi-label Learning (BPMLL), the Support Vector Machine with Heterogeneous Feature Kernel (SVM-HF), the Ranking Support Vector Machine (Rank-SVM), and the Multilabel Naive Bayesian.
Problem transformation and problem adaptation classifiers are included in ensemble methods.
The ensemble methods are built on top of the common Problem transformation and Algorithm adaptation methods, and they use a weighted vote of their predictions to classify new data points.
Many of these methods can be combined to produce a new multi-label classifier. The aggregated algorithms can be homogeneous in that the same algorithm can be used for each classifier, or heterogeneous in that different algorithms contribute to the final classifier’s construction. The addition of an ensemble of classifiers may mitigate the disadvantages of a single base-classifier. Several ensemble methods, including the ensemble of classifier chains and random k-label sets, have been proposed.
Metrics for Multi-Label Classification
Multi-label classification necessitates metrics that differ from those used in traditional single-label classification. These are classified as label-based or example-based. The label-based metrics are computed for each label and then averaged across all labels, whereas the example-based metrics are computed for each test example and then averaged across the test set. To evaluate the performance of the ML classifiers, the proposed work employs two label-based measures, one-error and average precision, as well as two example-based measures, accuracy and hamming-loss.
Many real-world applications use multi-label classification tasks such as gene classification in bioinformatics, medical diagnosis, document classification, music annotation, and image recognition. All of these applications require multi-label classification algorithms that are both effective and efficient.
Text categorization is one of the most used applications of multi-label classification, with the task of assigning predefined categories to text documents.
Multi-label classification algorithms are critical for data practitioners to have a general understanding of which algorithms perform the best so that they can try them out first. It is critical researchers investigate the performance of existing algorithms in order to gain insights that can be used to guide their research on developing more effective multi-label classification algorithms.
Single-label Classification is the process of classifying data that only has one class label. Multi-label classification, on the other hand, is the classification task where the data has two or more class labels.
Different algorithms have different approaches to classifying multi-label instances. A classifier that uses problem transformation for classification, for example, may rank and label test instances using a probability distribution over the transformed dataset.