On the Consistency of Automatic Scoring with Large Language Models

Large language models (LLMs) have shown great potential in automatic scoring. However, due to model characteristics and variation in training materials and pipelines, scoring inconsistency can exist within an LLM and across LLMs when rating the same response m... ...

请注册登录后继续浏览