Undergraduate Pacific Studies Exam Generation and Answering Using Retrieval Augmented Generation and Large Language Models
Files
Date
2025-01-07
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
1600
Ending Page
Alternative Title
Abstract
The capabilities of large language models have increased to the point where entire textbooks can be queried using retrieval-augmented generation (RAG). The study evaluates the ability of OpenAI’s ChatGPT-3.5-Turbo and ChatGPT-4-Turbo models to create and answer exam questions based on an undergraduate textbook. 14 exams were created with true-false, multiple-choice, and short-answer questions from a textbook available online. The accuracy of the models in answering these questions is assessed both with and without access to the source material. Performance was evaluated using text-similarity metrics including ROUGE-1, cosine similarity, and word embeddings. 56 exam scores were analyzed to find that RAG-assisted models outperformed those without access to the textbook, and that ChatGPT-4-Turbo was more accurate than ChatGPT-3.5-Turbo on nearly all exams. The findings demonstrate the potential of generative artificial intelligence tools in academic assessments and provide insights into comparative performance of these models.
Description
Keywords
Natural Language Processing and Large Language Models Supporting Data Analytics for System Sciences, academic examinations, generative artificial intelligence, large language models, retrieval augmented generation
Citation
Extent
10
Format
Geographic Location
Time Period
Related To
Proceedings of the 58th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.