Forecasting software runtime metrics: A comparative study of classical statistical, neural network, and foundation models

IRIS

Modern software applications generate a wide range of runtime metrics, which are vital to many quality assurance activities. These data are often recorded and aggregated as time series to observe patterns and trends of various runtime aspects over time. In this context, Time Series Forecasting (TSF) offers unique opportunities for predicting software runtime behavior and identifying potential anomalies. Although TSF models have been successfully applied in fields such as economics and climatology, their capabilities for forecasting software runtime metrics remain relatively underexplored. In this paper, we conduct a comprehensive empirical evaluation of 8 TSF models on 110 real-world software runtime metrics recorded over the course of about one year. Our evaluation encompasses three classical statistical models, three neural network models, and two time series foundation models. Results show that the foundation models achieve state-of-the-art performance on TSF of software runtime metrics, outperforming other models with strong statistical significance. Our findings indicate that foundation models, despite being trained exclusively on time series data from other domains, can effectively generalize to software runtime metrics in a zero-shot setting. This makes them a convenient plug-and-play solution for practitioners and researchers aiming to integrate TSF into their software quality assurance processes. Yet, their performance is not uniformly superior across all the time series, underscoring the absence of a “silver bullet” solution.

Forecasting software runtime metrics: A comparative study of classical statistical, neural network, and foundation models

Di Menna F.;Traini L.;Cortellessa V.

2026-01-01

Abstract

Modern software applications generate a wide range of runtime metrics, which are vital to many quality assurance activities. These data are often recorded and aggregated as time series to observe patterns and trends of various runtime aspects over time. In this context, Time Series Forecasting (TSF) offers unique opportunities for predicting software runtime behavior and identifying potential anomalies. Although TSF models have been successfully applied in fields such as economics and climatology, their capabilities for forecasting software runtime metrics remain relatively underexplored. In this paper, we conduct a comprehensive empirical evaluation of 8 TSF models on 110 real-world software runtime metrics recorded over the course of about one year. Our evaluation encompasses three classical statistical models, three neural network models, and two time series foundation models. Results show that the foundation models achieve state-of-the-art performance on TSF of software runtime metrics, outperforming other models with strong statistical significance. Our findings indicate that foundation models, despite being trained exclusively on time series data from other domains, can effectively generalize to software runtime metrics in a zero-shot setting. This makes them a convenient plug-and-play solution for practitioners and researchers aiming to integrate TSF into their software quality assurance processes. Yet, their performance is not uniformly superior across all the time series, underscoring the absence of a “silver bullet” solution.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				THE JOURNAL OF SYSTEMS AND SOFTWARE
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.jss.2026.112937
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/284000

Citazioni

ND

0

ND

social impact