Preliminary Evaluation of an LLM-Based System for Grading and Providing Feedback on Short-Text Answers in Data Science Exercises

Cofini, V.; Jobe, T.; Letteri, I.; Vittorini, P.

doi:10.1007/978-3-032-05070-0_9

Large Language Models (LLMs) have shown promise in automating feedback, enhancing accessibility, and customizing support. However, their integration into educational frameworks requires careful consideration of pedagogical effectiveness and accuracy. This paper describes an experimental setup in the specific domain of data science finalized to assess the ability of LLMs to provide accurate and helpful feedback to students. The dataset used in our study was obtained from a self-learning platform for medical students taking data science courses at the University of L’Aquila (Italy). We found that the most effective approach involved tailoring prompts based on the type of statistical test (normality tests or hypothesis tests). The accuracy of the LLM in providing KR feedback (i.e., right/wrong classification) was 0.93. The ability of the LLM to return adequate feedback to explain the mistake was measured in more than 75% of the cases, with more difficulty when the feedback is about the interpretation of a hypothesis test, adequate in only 71% cases. In summary, these findings are consistent with the growing literature on the use of LLMs in statistics, reinforcing their potential in this area of research. Longitudinal monitoring would be necessary to track how model improvements affect performance on educational feedback tasks over time, as conclusions valid for current models may quickly become outdated as the technology evolves.

Preliminary Evaluation of an LLM-Based System for Grading and Providing Feedback on Short-Text Answers in Data Science Exercises

Cofini V.;Jobe T.;Letteri I.;Vittorini P.

2025-01-01

Abstract

Large Language Models (LLMs) have shown promise in automating feedback, enhancing accessibility, and customizing support. However, their integration into educational frameworks requires careful consideration of pedagogical effectiveness and accuracy. This paper describes an experimental setup in the specific domain of data science finalized to assess the ability of LLMs to provide accurate and helpful feedback to students. The dataset used in our study was obtained from a self-learning platform for medical students taking data science courses at the University of L’Aquila (Italy). We found that the most effective approach involved tailoring prompts based on the type of statistical test (normality tests or hypothesis tests). The accuracy of the LLM in providing KR feedback (i.e., right/wrong classification) was 0.93. The ability of the LLM to return adequate feedback to explain the mistake was measured in more than 75% of the cases, with more difficulty when the feedback is about the interpretation of a hypothesis test, adequate in only 71% cases. In summary, these findings are consistent with the growing literature on the use of LLMs in statistics, reinforcing their potential in this area of research. Longitudinal monitoring would be necessary to track how model improvements affect performance on educational feedback tasks over time, as conclusions valid for current models may quickly become outdated as the technology evolves.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9783032050694
9783032050700
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/283559

Citazioni

ND

1

0

Preliminary Evaluation of an LLM-Based System for Grading and Providing Feedback on Short-Text Answers in Data Science Exercises

Cofini V.;Jobe T.;Letteri I.;Vittorini P.

2025-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)