Abstract: Ensembles of classifiers are one of the most powerful and widely used classification methods in Machine Learning. Ensembles of classifiers have proven beneficial in almost all fields of applications that encounter classification problems. Over the past few decades, numerous ensemble methods have emerged, yet a thorough comparison remains absent. In this paper, we carry out the most ample comparison of ensemble methods ever attempted. This paper tests more than 200 ensemble algorithms on a benchmark of 500 datasets, resulting in 1,183,880 experiments. Five different classification performance metrics are considered in the comparison. We first examine the general comparative performance of the methods, then analyze their behavior in relation to the number of features, the class-imbalance of the datasets, the number of classes, and the size of the datasets. We also studied the robustness of the best methods in the present of noise and their behavior from the point of view of a $\kappa$-error diagram. As a general rule, gradient boosting and methods based on strong randomization, such as RotationForest, ExtraTrees, and RandomForest, obtained the best results. However, whether used alone or in conjunction with a stacking classifier, support vector machines demonstrated competitive performance. The study of algorithm behavior based on dataset characteristics revealed that the group of best methods frequently depends on those characteristics.
Nicolás E.García-Pedrajas, Juan A. Romero del Castillo, Domingo Ortiz-Boyer, Gonzalo Cerruela-García, and Aida de Haro-García (2024) “Comprehensive comparison of ensembles of classifiers for binary and multiclass problems”, submitted.
Source code:
Supplementary material