Diagnosability, Adequacy & Size: How Test Suites Impact Autograding
Files
Date
2022-01-04
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Automated grading is now prevalent in software engineering courses, typically assessing the correctness of students' programs using automated test suites. However, deficiencies in test suites could result in inconsistent grading. As such, we investigate how different test suites impact grades, and the extent to which their observable properties influence these grades. We build upon existing work, using students' solution programs, and test suites that we constructed using a sampling approach. We find that there is a high variation in grades from different test suites, with a standard deviation of ~10.1%. We further investigate how several properties of test suites influence these grades, including the number of tests, coverage, ability to detect other faults, and uniqueness. We use our findings to provide tutors with strategies for building test suites that evaluate students' software with consistency. These strategies include constructing test suites with high coverage, writing unique and diverse tests which evaluate solutions' correctness in different ways, and to run the tests against artificial faults to determine their quality.
Description
Keywords
Assessment, Evaluation and Measurements (AEM), autograding, coverage, diagnosability, mutation, testing
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 55th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.