Evaluation of large language models in clinical neuroanatomy: a comparative scoring analysis based on accuracy, concordance, insight, and anatomical terminology accuracy

Background: Large language models (LLMs), such as ChatGPT-4 and Gemini 2.5, are increasingly being evaluated for clinical reasoning and medical education. However, their performance in structured, neuroanatomical diagnostic tasks... ...

请注册登录后继续浏览