The effects of intrajudge consistency feedback in an Angoff standard-setting procedure

Harrison, George Marcusen
Journal Title
Journal ISSN
Volume Title
[Honolulu] : [University of Hawaii at Manoa], [August 2013]
Agencies establishing performance levels on tests utilize standard-setting procedures to derive cutscores for making classificatory decisions about examinees. The credibility of standard-setting cutscores depends, in part, on two sources of internal validity evidence: intrajudge and interjudge consistency. Feedback to improve intrajudge consistency has been routinely suggested, but scarcely experimentally tested. This dissertation investigates the effect of item-level intrajudge-consistency feedback on changes in intrajudge and interjudge consistency. In this study, participants with secondary-or post-secondary teaching experience served as Angoff judges, making three rounds of judgments about the probability of success of conceptualized barely proficient examinees (BPEs) on 50 vocabulary-test items. Using a randomized experimental design, I assigned participants to either a treatment (n = 18) or control (n = 18) group and facilitated 23 standard-setting sessions. Treatment-group judges received item-level intrajudge-consistency feedback; control-group judges performed an alternative between-round task. Using a multilevel-model-for-change framework, I compared the two groups in their round-to-round changes in consistency indexes. Using generalizability theory, I investigated the changes in interjudge consistency and estimated the minimum number of judges needed to achieve a degree of precision specified in previous research. Results from the multilevel analysis indicated that improvements in intrajudge consistency were significantly greater for the treatment group (p < .001). Generalizability-theory results provided evidence of improved interjudge consistency: From Round 1 to 3, unexplained variance decreased from 36% to 23%, dependability improved from .94 to .96, and estimates of the fixed-item standard error of the cutscore decreased from 1.49 to 1.38. Decision-study results revealed diminishing returns in precision after about 10 judges. The findings suggest that item-level intrajudge-consistency feedback improves judges' accuracy in providing ratings that are consistent with their individual conceptualizations of the BPE. The feedback likely improves interjudge consistency by reducing variability attributed to idiosyncratic item ratings among judges. Decision-study results suggest that not only are about 10 judges sufficient for similar Angoff procedures, but also that feedback provides a benefit equivalent to hiring 2 judges. These findings contribute to the growing body of research on standard-setting feedback and provide empirical evidence for practitioners planning Angoff procedures.
Ph.D. University of Hawaii at Manoa 2013.
Includes bibliographical references.
generalizability theory
Access Rights
Email if you need this content in ADA-compliant format.