Clegg, BenjaminFraser, GordonMcminn, Phil2021-12-242021-12-242022-01-04978-0-9981331-5-7http://hdl.handle.net/10125/79438Automated grading is now prevalent in software engineering courses, typically assessing the correctness of students' programs using automated test suites. However, deficiencies in test suites could result in inconsistent grading. As such, we investigate how different test suites impact grades, and the extent to which their observable properties influence these grades. We build upon existing work, using students' solution programs, and test suites that we constructed using a sampling approach. We find that there is a high variation in grades from different test suites, with a standard deviation of ~10.1%. We further investigate how several properties of test suites influence these grades, including the number of tests, coverage, ability to detect other faults, and uniqueness. We use our findings to provide tutors with strategies for building test suites that evaluate students' software with consistency. These strategies include constructing test suites with high coverage, writing unique and diverse tests which evaluate solutions' correctness in different ways, and to run the tests against artificial faults to determine their quality.10 pagesengAttribution-NonCommercial-NoDerivatives 4.0 InternationalAssessment, Evaluation and Measurements (AEM)autogradingcoveragediagnosabilitymutationtestingDiagnosability, Adequacy & Size: How Test Suites Impact Autogradingtext10.24251/HICSS.2022.107