Improving the performance for multi-label problems through evolutionary label repopulation

Abstract: Multi-label classification has recently attracted great attention from the data mining research community. Multi-label classification is concerned with learning where each instance can be associated with multiple classes (or labels). One of the characteristics of many multi-label problems is the low density of the labels. This fact makes the classification problem hard as there is only sparse evidence to predict many labels. In this paper we propose a new method to improve the performance of any multi-label method by means of a label repopulation strategy. We assume the hypothesis that more dense datasets help may improve the performance of the algorithms. This assumption is based on the fact that adding new labels might make the learning of the separation surfaces easier not in assuming that the added labels correspond to actual relevant labels not present in the dataset due to erroneous labeling. As we do not know which new relevant labels would improve the learned models we address the task as an optimization problem and use an evolutionary algorithms to tackle the process of obtaining the best set of new labels that allows improving the performance of the classification methods. An extensive comparison using 45 datasets and 9 different classification models shows the advantage performance of our approach.

N. García-Pedrajas, J. A. Romero del Castillo, A. de Haro-García (2024) “Improving the performance for multi-label problems through evolutionary label repopulation”, submitted.

Supplementary material with detailed results and additional plots: