The effects of intrajudge consistency feedback in an Angoff standard-setting procedure

Date

2013-08

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

University of Hawaii at Manoa

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Agencies establishing performance levels on tests utilize standard-setting procedures to derive cutscores for making classificatory decisions about examinees. The credibility of standard-setting cutscores depends, in part, on two sources of internal validity evidence: intrajudge and interjudge consistency. Feedback to improve intrajudge consistency has been routinely suggested, but scarcely experimentally tested. This dissertation investigates the effect of item-level intrajudge-consistency feedback on changes in intrajudge and interjudge consistency. In this study, participants with secondary-or post-secondary teaching experience served as Angoff judges, making three rounds of judgments about the probability of success of conceptualized barely proficient examinees (BPEs) on 50 vocabulary-test items. Using a randomized experimental design, I assigned participants to either a treatment (n = 18) or control (n = 18) group and facilitated 23 standard-setting sessions. Treatment-group judges received item-level intrajudge-consistency feedback; control-group judges performed an alternative between-round task. Using a multilevel-model-for-change framework, I compared the two groups in their round-to-round changes in consistency indexes. Using generalizability theory, I investigated the changes in interjudge consistency and estimated the minimum number of judges needed to achieve a degree of precision specified in previous research. Results from the multilevel analysis indicated that improvements in intrajudge consistency were significantly greater for the treatment group (p < .001). Generalizability-theory results provided evidence of improved interjudge consistency: From Round 1 to 3, unexplained variance decreased from 36% to 23%, dependability improved from .94 to .96, and estimates of the fixed-item standard error of the cutscore decreased from 1.49 to 1.38. Decision-study results revealed diminishing returns in precision after about 10 judges. The findings suggest that item-level intrajudge-consistency feedback improves judges' accuracy in providing ratings that are consistent with their individual conceptualizations of the BPE. The feedback likely improves interjudge consistency by reducing variability attributed to idiosyncratic item ratings among judges. Decision-study results suggest that not only are about 10 judges sufficient for similar Angoff procedures, but also that feedback provides a benefit equivalent to hiring 2 judges. These findings contribute to the growing body of research on standard-setting feedback and provide empirical evidence for practitioners planning Angoff procedures.

Description

Keywords

generalizability theory

Citation

Extent

Format

Geographic Location

Time Period

Related To

Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Educational Psychology.

Related To (URI)

Table of Contents

Rights

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.