Massive: A brute force approximate training evolutionary approach for multi-label learning to find them all

Abstract: Multilabel classification tackles scenarios where each sample concurrently belongs to multiple binary classes, referred to as labels. The task of learning multilabel datasets is harder than single-label classification. Many of the most successful approaches are based on learning subsets of labels, usually termed labelsets. However, those methods often rely on using a small random subset of the enormous label space, obtaining results that are not optimal. Searching for a better combination of labelsets faces the challenge of a huge search space and the additional difficulty of training hundreds of classifiers in order to test the goodness of the labelsets. In this paper, we propose an evolutionary model based on a massive combination of hundreds of labelsets by means of an evolutionary algorithm and an approximate training of the labelsets. To improve the exploratory search ability of our method, the population is periodically reinitialized, keeping only the labelsets that are present in the best individual of the population. In that way, our method actively searches at the same time for good labelsets and good combinations of them. A thorough comparison using 70 datasets and 10 state-of-the-art methods shows the excellent performance of our method. Further studies on class imbalance, noisy, and missing labels datasets also supported the efficiency of our proposal.

N. García-Pedrajas, J. M. Cuevas-Muñoz, M. Mendoza-Hurtado, and A. de Haro-García (2025) “Massive: A brute force approximate training evolutionary approach for multi-label learning to find them all,” submitted.

Supplementary material.