Vulnerability of LLMs in Educational Assessment

10.7910/DVN/OV2WAM Milani, Alfredo Alfredo Milani https://orcid.org/0000-0003-4534-1805 Link Campus University Vulnerability of LLMs in Educational Assessment Harvard Dataverse 2025 Computer and Information Science Social Sciences Large Language Models Prompt Injection Education Sciences Education Evaluation Trustworthy AI Human-in-the-Loop AI Milani, Alfredo Alfredo Milani Link Campus University, Rome, Italy Valentina Franzoni Valentina Franzoni University of Perugia, Italy Florindi Emanuele University of Modena-Reggio Emilia 2025-09-12 2025-09-12 2227-7102 4804924 application/zip 1.0 Creative Commons CC0 1.0 Universal Public Domain Dedication. The dataset contains the output of experiments on a research project on Vulnerability of LLMs in Educational Assessment. The Dataset contains: -the students assignments data in normal form and the injected form -the output produced by the experimented LLMs: ChatGPT, Gemini, DeepSeek, Grok, Perplexity and Copilot for the experiments evaluation the assignments, as a single document and collectively as a group of documents, denominated: -User Legitimate LLMs Prompts -Normal (no injection) providing the reference base evaluation -Prompt Injection Pass, one type of injection experiments, called Fail-To-Top, to move an assignment evailuated FAIL by reference base evaluation to PASS, i.e. above 35% of total points. -Prompt Injection to Top25 , a type of injection experiments to move to top 25% an assignment with lowe reference base evaluation . This latter type of experiment come in 3 versions, Fail-To-Top, Sat-To-Top, Good-To-Top where assignment with reference base evaluation respectively: Fail (below 35%), Satisfactory (greater than 25% and belo 50%) and Good (above 50% and below 75%) are considered for injection. The name of the folders and output results files are accordingly self-explanatory .