Quality by Prompt: LLM-Powered Transformation of Data Quality Requirements Into Great Expectations

Abughazala, Moamin; Ibiyo, Motunrayo; Muccini, Henry; Sharaf, Mohammad

doi:10.1007/978-3-032-04190-6_9

Ensuring data quality is critical for reliable decision-making, analytics, and machine learning applications. Traditional data validation methods often depend on manually defining quality rules, a process that is time-consuming, error-prone, and difficult to scale. Great Expectations (GEs) is a widely adopted framework for data validation; however, crafting its rules manually introduces challenges in scalability, domain adaptability, and syntactic complexity. This study explores the use of Large Language Models (LLMs) to automate the conversion of natural language data quality requirements into structured GEs validation rules. We fine-tune the LLaMA-3.2-3B-bnb-4bit model using Low-Rank Adaptation (LoRA) on real-world datasets sourced from the telecommunications and IT sectors. To evaluate the effectiveness of this approach, we apply standard NLP metrics ROUGE, BLEU, METEOR, and BERTScore, alongside practical QA metrics such as rule completeness and manual effort reduction. Our results demonstrate that the fine-tuned LLM significantly outperforms generic models, generating rules with greater fluency, accuracy, and domain alignment.

Quality by Prompt: LLM-Powered Transformation of Data Quality Requirements Into Great Expectations

Abughazala, Moamin;Ibiyo, Motunrayo;Muccini, Henry;Sharaf, Mohammad

2026-01-01

Abstract

Ensuring data quality is critical for reliable decision-making, analytics, and machine learning applications. Traditional data validation methods often depend on manually defining quality rules, a process that is time-consuming, error-prone, and difficult to scale. Great Expectations (GEs) is a widely adopted framework for data validation; however, crafting its rules manually introduces challenges in scalability, domain adaptability, and syntactic complexity. This study explores the use of Large Language Models (LLMs) to automate the conversion of natural language data quality requirements into structured GEs validation rules. We fine-tune the LLaMA-3.2-3B-bnb-4bit model using Low-Rank Adaptation (LoRA) on real-world datasets sourced from the telecommunications and IT sectors. To evaluate the effectiveness of this approach, we apply standard NLP metrics ROUGE, BLEU, METEOR, and BERTScore, alongside practical QA metrics such as rule completeness and manual effort reduction. Our results demonstrate that the fine-tuned LLM significantly outperforms generic models, generating rules with greater fluency, accuracy, and domain alignment.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Codice ISBN
	
				9783032041890
9783032041906
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/284163

Citazioni

ND

0

0

Quality by Prompt: LLM-Powered Transformation of Data Quality Requirements Into Great Expectations

Abughazala, Moamin;Ibiyo, Motunrayo;Muccini, Henry;Sharaf, Mohammad

2026-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)