首页 文献索引 SCI期刊 AI助手
登录 注册
首页 正文

Journal of medical imaging and radiation oncology. 2025 Apr 8. doi: 10.1111/1754-9485.13858 Q42.22024

Comparative Performance of Anthropic Claude and OpenAI GPT Models in Basic Radiological Imaging Tasks

Anthropic Claude和OpenAI GPT模型在基本放射学影像任务中的性能比较 翻译改进

Cindy Nguyen  1, Daniel Carrion  2, Mohamed K Badawy  1  2

作者单位 +展开

作者单位

  • 1 Department of Medical Imaging and Radiation Sciences, Monash University, Clayton, Victoria, Australia.
  • 2 Monash Imaging, Monash Health, Clayton, Victoria, Australia.
  • DOI: 10.1111/1754-9485.13858 PMID: 40196917

    摘要 中英对照阅读

    Background: Publicly available artificial intelligence (AI) Vision Language Models (VLMs) are constantly improving. The advent of vision capabilities on these models could enhance radiology workflows. Evaluating their performance in radiological image interpretation is vital to their potential integration into practice.

    Aim: This study aims to evaluate the proficiency and consistency of the publicly available VLMs, Anthropic's Claude and OpenAI's GPT, across multiple iterations in basic image interpretation tasks.

    Method: Subsets from publicly available datasets, ROCOv2 and MURAv1.1, were used to evaluate 6 VLMs. A system prompt and image were input into each model three times. The outputs were compared to the dataset captions to evaluate each model's accuracy in recognising the modality, anatomy, and detecting fractures on radiographs. The consistency of the output across iterations was also analysed.

    Results: Evaluation of the ROCOv2 dataset showed high accuracy in modality recognition, with some models achieving 100%. Anatomical recognition ranged between 61% and 85% accuracy across all models tested. On the MURAv1.1 dataset, Claude-3.5-Sonnet had the highest anatomical recognition with 57% accuracy, while GPT-4o had the best fracture detection with 62% accuracy. Claude-3.5-Sonnet was the most consistent model, with 83% and 92% consistency in anatomy and fracture detection, respectively.

    Conclusion: Given Claude and GPT's current accuracy and reliability, the integration of these models into clinical settings is not yet feasible. This study highlights the need for ongoing development and establishment of standardised testing techniques to ensure these models achieve reliable performance.

    Keywords: AI; Claude; GPT; healthcare; large language models; vision language models.

    Keywords:Anthropic Claude; OpenAI GPT; radiological imaging tasks

    背景: 公开可用的人工智能(AI)视觉语言模型(VLMs)在不断改进。这些模型的视觉能力可能会增强放射学工作流程。评估它们在放射图像解释中的表现对于其潜在集成到实践中至关重要。

    目的: 本研究旨在评估公开可用的VLM,Anthropic的Claude和OpenAI的GPT,在多次迭代中进行基本图像解读任务的专业技能和一致性。

    方法: 使用来自公开数据集ROCOv2和MURAv1.1的子集来评估6种VLM。将系统提示和图像输入到每个模型三次,输出与数据集描述进行比较以评估每个模型在识别模态、解剖结构以及在X光片中检测骨折方面的准确性。还分析了迭代之间的输出一致性。

    结果: 对ROCOv2数据集的评估显示,在模态识别方面具有高精度,某些模型达到了100%。所有测试模型的解剖结构识别准确率在61%到85%之间。在MURAv1.1数据集中,Claude-3.5-Sonnet在解剖结构识别中拥有最高的准确性,为57%,而GPT-4o在骨折检测中表现最好,准确率为62%。Claude-3.5-Sonnet是最具一致性的模型,在解剖结构和骨折检测中的迭代一致性分别为83%和92%。

    结论: 鉴于Claude和GPT目前的准确性和可靠性,这些模型尚未能够在临床环境中集成。本研究强调了持续开发和完善标准化测试技术以确保这些模型可靠性能的重要性。

    关键词: AI;Claude;GPT;医疗保健;大型语言模型;视觉语言模型。

    关键词:人本主义克劳德; 开放AI GPT; 放射影像任务

    翻译效果不满意? 用Ai改进或 寻求AI助手帮助 ,对摘要进行重点提炼
    Copyright © Journal of medical imaging and radiation oncology. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Journal of medical imaging and radiation oncology

    缩写:J MED IMAG RADIAT ON

    ISSN:1754-9477

    e-ISSN:1754-9485

    IF/分区:2.2/Q4

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Comparative Performance of Anthropic Claude and OpenAI GPT Models in Basic Radiological Imaging Tasks