This study proposes an integrated analytical framework to enhance football performance analytics by combining feature engineering, fuzzy clustering, interpretable machine learning, and topological network analysis. The framework is designed to extract latent offensive profiles and predict high-efficiency scoring profiles across domestic and international competitions. The approach begins by constructing three composite indicators – Index of Offensive Efficiency, Competitive Resilience Index, and Versatility Score – designed to capture multidimensional aspects of a player’s offensive productivity, adaptability across competitions, and contribution breadth. These engineered metrics inform a fuzzy clustering algorithm that reveals two core performance profiles: “Seasoned Finishing Specialists” and “Emerging Versatile Contributors”. Building on this segmentation, a supervised learning model based on XGBoost is employed to predict the likelihood of surpassing a goals-per-shot efficiency threshold. Model interpretability is ensured via SHAP plot, which highlight the pivotal role of salary, finishing metrics, and competition-specific resilience. Partial dependence plots further expose nonlinear and interactive effects between key predictors. A network-based analysis complements the model by mapping performance similarities and identifying both archetypal and transitional performers via centrality measures. Robustness checks, including alternative winsorization, fuzziness levels, and subgroup-specific clustering, confirm the stability of the results. Overall, the proposed framework bridges segmentation and prediction with transparency and domain-relevance, offering a comprehensive toolkit for decision-makers in sports analytics, recruiters, and talent management.
Fuzzy clustering with robust learning models for soccer player profiling and resilience analysis
Pacifico, Antonio
2025-01-01
Abstract
This study proposes an integrated analytical framework to enhance football performance analytics by combining feature engineering, fuzzy clustering, interpretable machine learning, and topological network analysis. The framework is designed to extract latent offensive profiles and predict high-efficiency scoring profiles across domestic and international competitions. The approach begins by constructing three composite indicators – Index of Offensive Efficiency, Competitive Resilience Index, and Versatility Score – designed to capture multidimensional aspects of a player’s offensive productivity, adaptability across competitions, and contribution breadth. These engineered metrics inform a fuzzy clustering algorithm that reveals two core performance profiles: “Seasoned Finishing Specialists” and “Emerging Versatile Contributors”. Building on this segmentation, a supervised learning model based on XGBoost is employed to predict the likelihood of surpassing a goals-per-shot efficiency threshold. Model interpretability is ensured via SHAP plot, which highlight the pivotal role of salary, finishing metrics, and competition-specific resilience. Partial dependence plots further expose nonlinear and interactive effects between key predictors. A network-based analysis complements the model by mapping performance similarities and identifying both archetypal and transitional performers via centrality measures. Robustness checks, including alternative winsorization, fuzziness levels, and subgroup-specific clustering, confirm the stability of the results. Overall, the proposed framework bridges segmentation and prediction with transparency and domain-relevance, offering a comprehensive toolkit for decision-makers in sports analytics, recruiters, and talent management.| File | Dimensione | Formato | |
|---|---|---|---|
|
JBG_Pacifico.pdf
accesso aperto
Tipologia:
Documento in Versione Editoriale
Licenza:
Creative commons
Dimensione
4.39 MB
Formato
Adobe PDF
|
4.39 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


