The effects of intrajudge consistency feedback in an Angoff standard-setting procedure

Date
2013-08
Authors
Harrison, George Marcusen
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
[Honolulu] : [University of Hawaii at Manoa], [August 2013]
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Agencies establishing performance levels on tests utilize standard-setting procedures to derive cutscores for making classificatory decisions about examinees. The credibility of standard-setting cutscores depends, in part, on two sources of internal validity evidence: intrajudge and interjudge consistency. Feedback to improve intrajudge consistency has been routinely suggested, but scarcely experimentally tested. This dissertation investigates the effect of item-level intrajudge-consistency feedback on changes in intrajudge and interjudge consistency. In this study, participants with secondary-or post-secondary teaching experience served as Angoff judges, making three rounds of judgments about the probability of success of conceptualized barely proficient examinees (BPEs) on 50 vocabulary-test items. Using a randomized experimental design, I assigned participants to either a treatment (n = 18) or control (n = 18) group and facilitated 23 standard-setting sessions. Treatment-group judges received item-level intrajudge-consistency feedback; control-group judges performed an alternative between-round task. Using a multilevel-model-for-change framework, I compared the two groups in their round-to-round changes in consistency indexes. Using generalizability theory, I investigated the changes in interjudge consistency and estimated the minimum number of judges needed to achieve a degree of precision specified in previous research. Results from the multilevel analysis indicated that improvements in intrajudge consistency were significantly greater for the treatment group (p < .001). Generalizability-theory results provided evidence of improved interjudge consistency: From Round 1 to 3, unexplained variance decreased from 36% to 23%, dependability improved from .94 to .96, and estimates of the fixed-item standard error of the cutscore decreased from 1.49 to 1.38. Decision-study results revealed diminishing returns in precision after about 10 judges. The findings suggest that item-level intrajudge-consistency feedback improves judges' accuracy in providing ratings that are consistent with their individual conceptualizations of the BPE. The feedback likely improves interjudge consistency by reducing variability attributed to idiosyncratic item ratings among judges. Decision-study results suggest that not only are about 10 judges sufficient for similar Angoff procedures, but also that feedback provides a benefit equivalent to hiring 2 judges. These findings contribute to the growing body of research on standard-setting feedback and provide empirical evidence for practitioners planning Angoff procedures.
Description
Ph.D. University of Hawaii at Manoa 2013.
Includes bibliographical references.
Keywords
generalizability theory
Citation
Extent
Format
Geographic Location
Time Period
Related To
Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Educational Psychology.
Table of Contents
Rights
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.