Revisiting reliability with human and machine learning raters under scoring design and rater configuration in the many-facet Rasch model
{{output}}
Constructed-response (CR) items are widely used to assess higher order skills but require human scoring, which introduces variability and is costly at scale. Machine learning (ML)-based scoring offers a scalable alternative, yet its psychometric consequences i... ...