Achieving effective generalization in machine learning models is particularly challenging with small datasets that have high dimensionality. The combination of numerous features and few training instances often results in overfitting and poor performance on unseen data. This study conducts an in-depth analysis of HYB-PARSIMONY’s performance on high-dimensional datasets and introduces a novel methodology that integrates HYB-PARSIMONY with Bayesian Optimization (BO) to address this issue through i…
Read moreAchieving effective generalization in machine learning models is particularly challenging with small datasets that have high dimensionality. The combination of numerous features and few training instances often results in overfitting and poor performance on unseen data. This study conducts an in-depth analysis of HYB-PARSIMONY’s performance on high-dimensional datasets and introduces a novel methodology that integrates HYB-PARSIMONY with Bayesian Optimization (BO) to address this issue through iterative feature and hyperparameter selection. The methodology employs HYB-PARSIMONY with multiple random seeds to identify features with the highest mean probability, followed by hyperparameter tuning using BO to further enhance model performance. The experimental results demonstrate that this combined approach leads to models that are not only parsimonious but also capable of generalizing better compared to previous methods. By iteratively refining feature selection and hyperparameters, the proposed approach provides a more robust framework for building accurate machine learning models, even in challenging problems with a large number of features. In conclusion, the integration of HYB-PARSIMONY and BO significantly improves model generalization and reduces feature complexity, making it a promising methodology for small and high-dimensional datasets.