Abstract: One of the most common problems in data mining applications is the uneven distribution of classes, which appears in many real-world problems. The class of interest is often highly underrepresented in the dataset, a fact that harms the performance of most classifiers. One of the most successful methods to address the class imbalance problem is to oversample the minority class using synthetic samples. Since the original algorithm, SMOTE, proposed this method, numerous versions have emerged, each based on a specific hypothesis about where and how to generate new synthetic instances. In this paper, we propose a different approach based on exclusively evolutionary computation and place no restrictions on the creation of new synthetic instances. A thorough comparison using three classification methods, 85 datasets, and more than 90 class-imbalance strategies shows the advantages of our proposal.
Nicolás E. García-Pedrajas, José M. Cuevas-Muñoz, and A. de Haro-García (2024) “BlindSMOTE: Evolutionary computation only based synthetic minority oversampling,” submitted.
Supplementary material with detailed results and additional figures: