The effects of intrajudge consistency feedback in an Angoff standard-setting procedure

Harrison, George Marcusen

The effects of intrajudge consistency feedback in an Angoff standard-setting procedure

Files

Harrison_George_r.pdf (2.3 MB)

Harrison_George_uh.pdf (2.35 MB)

Date

2013-08

Authors

Harrison, George Marcusen

Publisher

[Honolulu] : [University of Hawaii at Manoa], [August 2013]

Abstract

Agencies establishing performance levels on tests utilize standard-setting procedures to derive cutscores for making classificatory decisions about examinees. The credibility of standard-setting cutscores depends, in part, on two sources of internal validity evidence: intrajudge and interjudge consistency. Feedback to improve intrajudge consistency has been routinely suggested, but scarcely experimentally tested. This dissertation investigates the effect of item-level intrajudge-consistency feedback on changes in intrajudge and interjudge consistency. In this study, participants with secondary-or post-secondary teaching experience served as Angoff judges, making three rounds of judgments about the probability of success of conceptualized barely proficient examinees (BPEs) on 50 vocabulary-test items. Using a randomized experimental design, I assigned participants to either a treatment (n = 18) or control (n = 18) group and facilitated 23 standard-setting sessions. Treatment-group judges received item-level intrajudge-consistency feedback; control-group judges performed an alternative between-round task. Using a multilevel-model-for-change framework, I compared the two groups in their round-to-round changes in consistency indexes. Using generalizability theory, I investigated the changes in interjudge consistency and estimated the minimum number of judges needed to achieve a degree of precision specified in previous research. Results from the multilevel analysis indicated that improvements in intrajudge consistency were significantly greater for the treatment group (p < .001). Generalizability-theory results provided evidence of improved interjudge consistency: From Round 1 to 3, unexplained variance decreased from 36% to 23%, dependability improved from .94 to .96, and estimates of the fixed-item standard error of the cutscore decreased from 1.49 to 1.38. Decision-study results revealed diminishing returns in precision after about 10 judges. The findings suggest that item-level intrajudge-consistency feedback improves judges' accuracy in providing ratings that are consistent with their individual conceptualizations of the BPE. The feedback likely improves interjudge consistency by reducing variability attributed to idiosyncratic item ratings among judges. Decision-study results suggest that not only are about 10 judges sufficient for similar Angoff procedures, but also that feedback provides a benefit equivalent to hiring 2 judges. These findings contribute to the growing body of research on standard-setting feedback and provide empirical evidence for practitioners planning Angoff procedures.

Description

Ph.D. University of Hawaii at Manoa 2013.
Includes bibliographical references.

Keywords

generalizability theory

URI

http://hdl.handle.net/10125/100594

Related To

Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Educational Psychology.

Collections

Ph.D. - Educational Psychology

Full item page

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.