Evaluating large language models as graders of medical short answer questions: a comparative analysis with expert human graders
{{output}}
The assessment of short-answer questions (SAQs) in medical education is resource-intensive, requiring significant expert time. Large Language Models (LLMs) offer potential for automating this process, but their efficacy in specialized medical education assessm... ...