|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Vulnerability of LLMs in Educational Assessment |
|
Identification Number: |
doi:10.7910/DVN/OV2WAM |
|
Distributor: |
Harvard Dataverse |
|
Date of Distribution: |
2025-09-12 |
|
Version: |
1 |
|
Bibliographic Citation: |
Milani, Alfredo, 2025, "Vulnerability of LLMs in Educational Assessment", https://doi.org/10.7910/DVN/OV2WAM, Harvard Dataverse, V1 |
|
Citation |
|
|
Title: |
Vulnerability of LLMs in Educational Assessment |
|
Identification Number: |
doi:10.7910/DVN/OV2WAM |
|
Authoring Entity: |
Milani, Alfredo (https://ror.org/035mh1293) |
|
Distributor: |
Harvard Dataverse |
|
Access Authority: |
Milani, Alfredo |
|
Access Authority: |
Valentina Franzoni |
|
Access Authority: |
Florindi Emanuele |
|
Depositor: |
Milani, Alfredo |
|
Date of Deposit: |
2025-09-12 |
|
Holdings Information: |
https://doi.org/10.7910/DVN/OV2WAM |
|
Study Scope |
|
|
Keywords: |
Computer and Information Science, Social Sciences, Large Language Models, Prompt Injection, Education Sciences, Education Evaluation, Trustworthy AI, Human-in-the-Loop AI |
|
Abstract: |
The dataset contains the output of experiments on a research project on Vulnerability of LLMs in Educational Assessment. The Dataset contains: -the students assignments data in normal form and the injected form -the output produced by the experimented LLMs: ChatGPT, Gemini, DeepSeek, Grok, Perplexity and Copilot for the experiments evaluation the assignments, as a single document and collectively as a group of documents, denominated: -User Legitimate LLMs Prompts -Normal (no injection) providing the reference base evaluation -Prompt Injection Pass, one type of injection experiments, called Fail-To-Top, to move an assignment evailuated FAIL by reference base evaluation to PASS, i.e. above 35% of total points. -Prompt Injection to Top25 , a type of injection experiments to move to top 25% an assignment with lowe reference base evaluation . This latter type of experiment come in 3 versions, Fail-To-Top, Sat-To-Top, Good-To-Top where assignment with reference base evaluation respectively: Fail (below 35%), Satisfactory (greater than 25% and belo 50%) and Good (above 50% and below 75%) are considered for injection. The name of the folders and output results files are accordingly self-explanatory . |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Access |
|
|
Notes: |
<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a> |
|
Other Study Description Materials |
|
|
Related Publications |
|
|
Citation |
|
|
Title: |
"When AI is Fooled: Hidden Risks in LLM-assisted Grading" Authors: Alfredo Milani, Valentina Franzoni, Emanuele Florindi, Assel Omarbekova, Gulmira Bekmanova, Banu Yergesh in Education Sciences, ISSN 2227-7102 |
|
Identification Number: |
2227-7102 |
|
Bibliographic Citation: |
"When AI is Fooled: Hidden Risks in LLM-assisted Grading" Authors: Alfredo Milani, Valentina Franzoni, Emanuele Florindi, Assel Omarbekova, Gulmira Bekmanova, Banu Yergesh in Education Sciences, ISSN 2227-7102 |
|
Label: |
Normal_and_Injected_Assignment_Experiments.zip |
|
Text: |
The dataset contains the output of experiments on a research project on Vulnerability of LLMs in Educational Assessment. The Dataset contains: -the students assignments data in normal form and the injected form -the output produced by the experimented LLMs: ChatGPT, Gemini, DeepSeek, Grok, Perplexity and Copilot for the experiments evaluation the assignments, as a single document and collectively as a group of documents, denominated: -Normal (no injection) providing the reference base evaluation -Prompt Injection Pass, one type of injection experiments, called Fail-To-Top, to move an assignment evailuated FAIL by reference base evaluation to PASS, i.e. above 35% of total points. -Prompt Injection to Top25 , a type of injection experiments to move to top 25% an assignment with lowe reference base evaluation . This latter type of experiment come in 3 versions, Fail-To-Top, Sat-To-Top, Good-To-Top where assignment with reference base evaluation respectively: Fail (below 35%), Satisfactory (greater than 25% and belo 50%) and Good (above 50% and below 75%) are considered for injection. The name of the folders and output results files are accordingly self-explanatory . |
|
Notes: |
application/zip |