Benefits of the Curious Behavior of Bayesian Hierarchical Item Response Theory Models-An in-Depth Investigation and Bias Correction [0.03%]
贝叶斯分层项目反应理论模型的好奇行为的好处—深度调查与偏差修正
Christoph König,Rainer W Alexandrowicz
Christoph König
When using Bayesian hierarchical modeling, a popular approach for Item Response Theory (IRT) models, researchers typically face a tradeoff between the precision and accuracy of the item parameter estimates. Given the pooling principle and v...
Detecting uniform differential item functioning for continuous response computerized adaptive testing [0.03%]
连续响应计算机化自适应测验中项目无偏性检测研究
Chun Wang,Ruoyi Zhu
Chun Wang
Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adapt...
Comparing Test-Taking Effort Between Paper-Based and Computer-Based Tests [0.03%]
纸笔测试和计算机化测试中的答题努力程度的对比研究
Sebastian Weirich,Karoline A Sachse,Sofie Henschel et al.
Sebastian Weirich et al.
The article compares the trajectories of students' self-reported test-taking effort during a 120 minutes low-stakes large-scale assessment of English comprehension between a paper-and-pencil (PPA) and a computer-based assessment (CBA). Test...
Corrigendum to "irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT" [0.03%]
《irtplay:一个R软件包,用于在线项目校准、评分、模型拟合评估及一维项目反应理论的实用函数》的勘误表
[This corrects the article DOI: 10.1177/0146621620921247.]. © The Author(s) 2024.
Published Erratum
Applied psychological measurement. 2024 Mar;48(1-2):77. DOI:10.1177/01466216231223043 2024
Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests [0.03%]
混合题型考试项目反应理论核等值的效度分析
Joakim Wallmark,Maria Josefsson,Marie Wiberg
Joakim Wallmark
This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were...
Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation [0.03%]
小中等样本量下基于分级响应模型的辅助项目信息在项目参数估计中的应用:实证与分层贝叶斯估计比较研究
Matthew Naveiras,Sun-Joo Cho
Matthew Naveiras
Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bay...
A Bayesian Random Weights Linear Logistic Test Model for Within-Test Practice Effects [0.03%]
一种贝叶斯随机权重线性逻辑测试模型的内部练习效应
José H Lozano,Javier Revuelta
José H Lozano
The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of th...
Controlling the Minimum Item Exposure Rate in Computerized Adaptive Testing: A Two-Stage Sympson-Hetter Procedure [0.03%]
计算机化适应性测试中控制最小项目呈现率的两阶段Sympson-Hetter程序
Hsiu-Yi Chao,Jyun-Hong Chen
Hsiu-Yi Chao
Computerized adaptive testing (CAT) can improve test efficiency, but it also causes the problem of unbalanced item usage within a pool. The effect of uneven item exposure rates can not only induce a test security problem due to overexposed ...
Two Statistics for Measuring the Score Comparability of Computerized Adaptive Tests [0.03%]
两种计算机自适应测试分数比较性的测量统计量
Adam E Wyse
Adam E Wyse
This study introduces two new statistics for measuring the score comparability of computerized adaptive tests (CATs) based on comparing conditional standard errors of measurement (CSEMs) for examinees that achieved the same scale scores. On...
Does Sparseness Matter? Examining the Use of Generalizability Theory and Many-Facet Rasch Measurement in Sparse Rating Designs [0.03%]
稀疏评级设计中通用性理论和多维度Rasch测量使用之探讨
Stefanie A Wind,Eli Jones,Sara Grajeda
Stefanie A Wind
Sparse rating designs, where each examinee's performance is scored by a small proportion of raters, are prevalent in practical performance assessments. However, relatively little research has focused on the degree to which different analyti...