{"@context":{"@language":"en","@vocab":"https://schema.org/","citeAs":"cr:citeAs","column":"cr:column","conformsTo":"dct:conformsTo","cr":"http://mlcommons.org/croissant/","rai":"http://mlcommons.org/croissant/RAI/","data":{"@id":"cr:data","@type":"@json"},"dataType":{"@id":"cr:dataType","@type":"@vocab"},"dct":"http://purl.org/dc/terms/","examples":{"@id":"cr:examples","@type":"@json"},"extract":"cr:extract","field":"cr:field","fileProperty":"cr:fileProperty","fileObject":"cr:fileObject","fileSet":"cr:fileSet","format":"cr:format","includes":"cr:includes","isLiveDataset":"cr:isLiveDataset","jsonPath":"cr:jsonPath","key":"cr:key","md5":"cr:md5","parentField":"cr:parentField","path":"cr:path","recordSet":"cr:recordSet","references":"cr:references","regex":"cr:regex","repeated":"cr:repeated","replace":"cr:replace","sc":"https://schema.org/","separator":"cr:separator","source":"cr:source","subField":"cr:subField","transform":"cr:transform","wd":"https://www.wikidata.org/wiki/"},"@type":"sc:Dataset","conformsTo":"http://mlcommons.org/croissant/1.0","name":"Vulnerability of LLMs in Educational Assessment","url":"https://doi.org/10.7910/DVN/OV2WAM","creator":[{"@type":"Person","givenName":"Alfredo","familyName":"Milani","affiliation":{"@type":"Organization","name":"https://ror.org/035mh1293"},"sameAs":"https://orcid.org/0000-0003-4534-1805","@id":"https://orcid.org/0000-0003-4534-1805","identifier":"https://orcid.org/0000-0003-4534-1805","name":"Milani, Alfredo"}],"description":"The dataset contains the output of experiments on a research project on Vulnerability of LLMs in Educational Assessment. The Dataset contains: -the students assignments data in normal form and the injected form -the output produced by the experimented LLMs: ChatGPT, Gemini, DeepSeek, Grok, Perplexity and Copilot for the experiments evaluation the assignments, as a single document and collectively as a group of documents, denominated: -User Legitimate LLMs Prompts -Normal (no injection) providing the reference base evaluation -Prompt Injection Pass, one type of injection experiments, called Fail-To-Top, to move an assignment evailuated FAIL by reference base evaluation to PASS, i.e. above 35% of total points. -Prompt Injection to Top25 , a type of injection experiments to move to top 25% an assignment with lowe reference base evaluation . This latter type of experiment come in 3 versions, Fail-To-Top, Sat-To-Top, Good-To-Top where assignment with reference base evaluation respectively: Fail (below 35%), Satisfactory (greater than 25% and belo 50%) and Good (above 50% and below 75%) are considered for injection. The name of the folders and output results files are accordingly self-explanatory .","keywords":["Computer and Information Science","Social Sciences","Large Language Models","Prompt Injection","Education Sciences","Education Evaluation","Trustworthy AI","Human-in-the-Loop AI"],"license":"http://creativecommons.org/publicdomain/zero/1.0","datePublished":"2025-09-12","dateModified":"2025-09-12","includedInDataCatalog":{"@type":"DataCatalog","name":"Harvard Dataverse","url":"https://dataverse.harvard.edu"},"publisher":{"@type":"Organization","name":"Harvard Dataverse"},"version":"1.0","citeAs":"@data{DVN/OV2WAM_2025,author = {Milani, Alfredo},publisher = {Harvard Dataverse},title = {Vulnerability of LLMs in Educational Assessment},year = {2025},url = {https://doi.org/10.7910/DVN/OV2WAM}}","citation":[{"@type":"CreativeWork","name":"\"When AI is Fooled: Hidden Risks in LLM-assisted Grading\" Authors: Alfredo Milani, Valentina Franzoni, Emanuele Florindi, Assel Omarbekova, Gulmira Bekmanova, Banu Yergesh in Education Sciences, ISSN 2227-7102"}],"distribution":[{"@type":"cr:FileObject","@id":"Normal_and_Injected_Assignment_Experiments.zip","name":"Normal_and_Injected_Assignment_Experiments.zip","encodingFormat":"application/zip","md5":"d6580deb1f5fd647a0b3f3ccbb31fbda","contentSize":"4804924","description":"The dataset contains the output of experiments on a research project on \nVulnerability of LLMs in Educational Assessment.\n\nThe Dataset contains:\n-the students assignments data in normal form and the injected form\n-the output produced by the experimented LLMs: ChatGPT, Gemini, DeepSeek, Grok, Perplexity and Copilot for the experiments evaluation the assignments, as a single document and collectively as a group of documents, denominated:\n  \n-Normal (no injection) providing the reference base evaluation\n -Prompt Injection Pass, one  type of injection experiments, called Fail-To-Top,  to move an assignment evailuated FAIL by reference base evaluation to PASS, i.e. above 35% of total points.\n -Prompt Injection to Top25 , a type of injection experiments  to move to top 25% an assignment with lowe reference base evaluation . This latter type of experiment come in 3 versions, Fail-To-Top, Sat-To-Top, Good-To-Top where assignment with reference base evaluation respectively: Fail (below 35%), Satisfactory (greater than 25% and belo 50%) and Good (above 50% and below 75%) are considered for injection.\n\nThe name of the folders and output results files are accordingly self-explanatory .","contentUrl":"https://dataverse.harvard.edu/api/access/datafile/12068843"}]}