EnseSmells : Deep ensemble and programming language models for automated code smells detection

IRIS

A smell in software source code denotes an indication of suboptimal design and implementation decisions, potentially hindering the code understanding and, in turn, raising the likelihood of being prone to changes and faults. Identifying these code issues at an early stage in the software development process can mitigate these problems and enhance the overall quality of the software. Current research primarily focuses on the utilization of deep learning-based models to investigate the contextual information concealed within source code instructions to detect code smells, with limited attention given to the importance of structural and design-related features. This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics derived from pre-trained models for programming languages. We further provide a thorough analysis of how different source code embedding models affect the detection performance with respect to different code smell types. Using four widely-used code smells from well-designed datasets, our empirical study shows that incorporating design-related features significantly improves detection accuracy, outperforming state-of-the-art methods on the MLCQ dataset with improvements ranging from 5.98% to 28.26%, depending on the type of code smell.

EnseSmells : Deep ensemble and programming language models for automated code smells detection

Ho, Anh;Bui, Anh M. T.;Nguyen, Phuong;Di Salle, Amleto;Le, Bach

2025-01-01

Abstract

A smell in software source code denotes an indication of suboptimal design and implementation decisions, potentially hindering the code understanding and, in turn, raising the likelihood of being prone to changes and faults. Identifying these code issues at an early stage in the software development process can mitigate these problems and enhance the overall quality of the software. Current research primarily focuses on the utilization of deep learning-based models to investigate the contextual information concealed within source code instructions to detect code smells, with limited attention given to the importance of structural and design-related features. This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics derived from pre-trained models for programming languages. We further provide a thorough analysis of how different source code embedding models affect the detection performance with respect to different code smell types. Using four widely-used code smells from well-designed datasets, our empirical study shows that incorporating design-related features significantly improves detection accuracy, outperforming state-of-the-art methods on the MLCQ dataset with improvements ranging from 5.98% to 28.26%, depending on the type of code smell.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				THE JOURNAL OF SYSTEMS AND SOFTWARE
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.jss.2025.112375
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0164121225000433-main.pdf accesso aperto Tipologia: Documento in Versione Editoriale Licenza: Creative commons Dimensione 2.08 MB Formato Adobe PDF Visualizza/Apri	2.08 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/275581

Citazioni

ND

7

3

social impact