Putting Psychology to the Test: Rethinking Model Evaluation Through Benchmarking and Prediction
{{output}}
Consensus on standards for evaluating models and theories is an integral part of every science. Nonetheless, in psychology, relatively little focus has been placed on defining reliable communal metrics to assess model performance. Evaluation practices are ofte... ...