Educational and psychological measurement文章索引

Estimation of Conditional Standard Errors of Measurement for MLE Scores in MST [0.03%] 估计MST中MLE分数条件标准测量误差的方法

Yuanyuan J Stirn,Won-Chan Lee Yuanyuan J Stirn

This paper proposes an information-based analytic method for calculating the conditional standard error of measurement (CSEM) in multistage testing (MST) using maximum likelihood estimation. The accuracy of the proposed method was evaluated...

Educational and psychological measurement. 2026 Feb 25:00131644261420391. DOI:10.1177/00131644261420391 2026

Misclassification Produced by Rapid-Guessing Identification Methods and Their Suitability Under Various Conditions [0.03%] 快速猜测识别方法产生的误识率及其在各种条件下的适用性

Santeri Holopainen,Jari Metsämuuronen,Mikko-Jussi Laakso et al. Santeri Holopainen et al.

Response Time Threshold Methods (RTTMs) are widely used to identify rapid-guessing behavior (RG) in low-stakes assessments, yet face two key challenges: (a) inevitable misclassifications due to overlapping response time distributions of eng...

Educational and psychological measurement. 2026 Feb 23:00131644261419426. DOI:10.1177/00131644261419426 2026

From Agreement to Epistemic Alignment: A Signal Detection-Theoretic Model of Inter-Rater Reliability [0.03%] 从共识到知识对齐：评分者一致性检验的信号检测理论模型

Irene Gianeselli Irene Gianeselli

Inter-rater reliability is commonly assessed using chance-corrected agreement coefficients such as Cohen's κ, which summarize concordance among categorical judgments without modeling the inferential processes that generate them. As a resul...

Educational and psychological measurement. 2026 Feb 16:00131644261417643. DOI:10.1177/00131644261417643 2026

On the Consistency of Automatic Scoring with Large Language Models [0.03%] 关于大型语言模型自动评分一致性的研究

Mingfeng Xue,Xingyao Xiao,Yunting Liu et al. Mingfeng Xue et al.

Large language models (LLMs) have shown great potential in automatic scoring. However, due to model characteristics and variation in training materials and pipelines, scoring inconsistency can exist within an LLM and across LLMs when rating...

Educational and psychological measurement. 2026 Feb 16:00131644261418138. DOI:10.1177/00131644261418138 2026

Comparing Different Approaches of (Not) Accounting for Rapid Guessing in Plausible Values Estimation [0.03%] 几种（不）考虑猜测因素的概化值估计方法的比较研究

Jana Welling,Eva Zink,Timo Gnambs Jana Welling

Educational large-scale assessments provide information on ability differences between groups, informing policies and shaping educational decisions. However, some of these differences might partly reflect variations in test-taking motivatio...

Educational and psychological measurement. 2026 Jan 13:00131644251395590. DOI:10.1177/00131644251395590 2026

Consistent Factor Score Regression: A Better Alternative for Uncorrected Factor Score Regression? [0.03%] 一致的因素得分回归：未经校正的因素得分回归的一个更好替代选择吗？

Jasper Bogaert,Wen Wei Loh,Yves Rosseel Jasper Bogaert

Researchers in the behavioral, educational, and social sciences often aim to analyze relationships among latent variables. Structural equation modeling (SEM) is widely regarded as the gold standard for this purpose. A straightforward altern...

Educational and psychological measurement. 2026 Jan 4:00131644251399588. DOI:10.1177/00131644251399588 2026

Empowering Expert Judgment: A Data-Driven Decision Framework for Standard Setting in High-Dimensional and Data-Scarce Assessments [0.03%] 赋能专家判断：高维与数据稀缺性评估中的数据驱动决策框架

Tianpeng Zheng,Zhehan Jiang,Zhichen Guo et al. Tianpeng Zheng et al.

A critical methodological challenge in standard setting arises in small-sample, high-dimensional contexts where the number of items substantially exceeds the number of examinees. Under such conditions, conventional data-driven methods that ...

Educational and psychological measurement. 2026 Jan 2:00131644251405406. DOI:10.1177/00131644251405406 2026

Evaluation of Residual-Based Fit Statistics for Item Response Theory Models in the Presence of Non-Responses [0.03%] 缺失数据下对项目反应理论模型残差拟合准则的评价

Minho Lee,Juyoung Jung Minho Lee

Residual-based fit statistics, which compare observed item statistics (e.g., proportions) with model-implied probabilities, are widely used to evaluate model fit, item fit, and local dependence in item response theory (IRT) models. Despite ...

Educational and psychological measurement. 2025 Dec 24:00131644251393444. DOI:10.1177/00131644251393444 2025

Conditional Reliability of Weighted Test Scores on a Bounded D-Scale [0.03%] 权重测验分数在有界D尺度上的条件可靠性

Dimiter M Dimitrov,Dimitar V Atanasov Dimiter M Dimitrov

Based on previous research on conditional reliability for number-correct test scores, conditioned on levels of the logit scale in item response theory, this article deals with conditional reliability of classical-type weighted scores condit...

Educational and psychological measurement. 2025 Dec 20:00131644251396543. DOI:10.1177/00131644251396543 2025

Collapsing Sparse Responses in Likert-Type Scale Data: Advantages and Disadvantages for Model Fit in CFA [0.03%] 列克特型量表数据中反应选项的合并：对确认性因素分析模型适配的影响及得失之处

Jin Liu,Yu Bao,Christine DiStefano et al. Jin Liu et al.

Applied researchers often encounter situations where certain item response categories receive very few endorsements, resulting in sparse data. Collapsing categories may mitigate sparsity by increasing cell counts, yet the methodological con...

Educational and psychological measurement. 2025 Dec 19:00131644251401097. DOI:10.1177/00131644251401097 2025