The use of artificial intelligence proved to be useful to automating the grading process, especially when the assessment involves a large number of students. The general problem we are addressing is the automated grading of assignments, which solutions are composed of a list of commands, their outputs, and possible comments. In this paper, we focus on the automated classification of the comments, as “right” or “wrong”. In particular, we investigated the effect of different features (i.e., fastText, BERT, distance-based and custom features), fed to several classifiers (i.e., Logistic Regression, Support Vector Machines, Random Forest, Multi-Layer Perceptron – MLP), to select the best one in terms of best balanced accuracy. In the experiment carried out, the best result was obtained by the MLP classifier using the fastText embeddings. When instead fed with BERT embeddings, MLP obtained a slightly lower accuracy and F1 score, even if it remains the best option with respect to the other classifiers. Furthermore, we tested the classifier with comments given to different assignments (of the same structure), given by different students and evaluated by a different professor. Also in this case, we achieved a relatively good accuracy and F1 score.
File in questo prodotto:
Non ci sono file associati a questo prodotto.