YIC2025

Exploring AI-based feature selection techniques for geoeffective solar events prediction

  • Camattari, Fabiana (Università di Genova)
  • Guastavino, Sabrina (Università di Genova)
  • Marchetti, Francesco (Università di Padova)
  • Perracchione, Emma (Politecnico di Torino)
  • Piana, Michele (Università di Genova)
  • Massone, Anna Maria (Università di Genova)

Please login to view abstract download link

In machine learning, enhancing sparsity is a crucial issue. Specifically, in supervised learning settings, feature selection techniques can improve both model performance and interpretability. These techniques help identify which features contribute to the model's decision-making process, and they can reduce the model's complexity while preserving or even enhancing its performance. This aspect is frequently investigated by means of methods that are independent of the model used to perform the prediction with that reduced number of features (in this case, a classifier). Some of the state-of-the-art methods include the Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE) or Support Vector Machines (SVMs). In this work we explore a new “greedy” feature selection approach, which performs an iterative selection of the most important features based on the chosen classifier and performance score. The benefits of such approach are theoretically investigated in terms of model capacity indicators, such as the Kernel Alignement for SVM-based greedy strategies, which demonstrate that the algorithm is able to construct classifiers whose complexity capacity grows at each step. The scheme is then applied to a real-world problem: predicting geoeffective manifestations of the Sun. It is compared with other well-established techniques, with results showing that the extracted features are indeed the most prominent ones associated with the physical processes involved in the transfer of energy from the Coronal Mass Ejection, which are the primary drivers of geomagnetic storms. Consequently, the classification metrics indicate an improvement in predictive abilities with models that use the reduced set of features, both for greedy SVM-based and a Feed Forward Neural Network.