首页 正文

Evaluation of large language models in clinical neuroanatomy: a comparative scoring analysis based on accuracy, concordance, insight, and anatomical terminology accuracy

{{output}}
Background: Large language models (LLMs), such as ChatGPT-4 and Gemini 2.5, are increasingly being evaluated for clinical reasoning and medical education. However, their performance in structured, neuroanatomical diagnostic tasks... ...