International journal of computer assisted radiology and surgery. 2025 Jun 11. doi: 10.1007/s11548-025-03439-5 Q32.32024

Enhancing generalization in zero-shot multi-label endoscopic instrument classification

增强零样本多标签内窥镜器械分类的泛化能力翻译改进

Raphaela Maerkl¹, Tobias Rueckert^{2 3}, David Rauber², Max Gutbrod^{2 4}, Danilo Weber Nunes^{2 4}, Christoph Palm^{2 4 5}

作者单位 +展开

作者单位

¹ Regensburg Medical Image Computing (ReMIC), OTH Regensburg, 93053, Regensburg, Germany. raphaela.maerkl@st.oth-regensburg.de.

² Regensburg Medical Image Computing (ReMIC), OTH Regensburg, 93053, Regensburg, Germany.

³ AKTORmed Robotic Surgery, 93073, Neutraubling, Germany.

⁴ Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, 93053, Regensburg, Germany.

⁵ Regensburg Center of Biomedical Engineering (RCBE), OTH Regensburg and Regensburg University, 93053, Regensburg, Germany.

DOI: 10.1007/s11548-025-03439-5 PMID: 40498241

摘要中英对照阅读

Purpose: Recognizing previously unseen classes with neural networks is a significant challenge due to their limited generalization capabilities. This issue is particularly critical in safety-critical domains such as medical applications, where accurate classification is essential for reliability and patient safety. Zero-shot learning methods address this challenge by utilizing additional semantic data, with their performance relying heavily on the quality of the generated embeddings.

Methods: This work investigates the use of full descriptive sentences, generated by a Sentence-BERT model, as class representations, compared to simpler category-based word embeddings derived from a BERT model. Additionally, the impact of z-score normalization as a post-processing step on these embeddings is explored. The proposed approach is evaluated on a multi-label generalized zero-shot learning task, focusing on the recognition of surgical instruments in endoscopic images from minimally invasive cholecystectomies.

Results: The results demonstrate that combining sentence embeddings and z-score normalization significantly improves model performance. For unseen classes, the AUROC improves from 43.9 % to 64.9 %, and the multi-label accuracy from 26.1 % to 79.5 %. Overall performance measured across both seen and unseen classes improves from 49.3 % to 64.9 % in AUROC and from 37.3 % to 65.1 % in multi-label accuracy, highlighting the effectiveness of our approach.

Conclusion: These findings demonstrate that sentence embeddings and z-score normalization can substantially enhance the generalization performance of zero-shot learning models. However, as the study is based on a single dataset, future work should validate the method across diverse datasets and application domains to establish its robustness and broader applicability.

Keywords: Generalized zero-shot learning; Multi-label classification; Sentence embeddings; Surgical instruments; Z-score normalization.

Keywords：zero-shot learning; multi-label classification; endoscopic instruments

目的： 用神经网络识别以前未见过的类别是一个重大挑战，因为它们的泛化能力有限。这一问题在医疗应用等安全关键领域尤为严重，准确分类对于可靠性和患者安全至关重要。零样本学习方法通过利用额外的语义数据来解决这个问题，其性能很大程度上取决于生成嵌入的质量。

方法： 这项工作研究了由Sentence-BERT模型生成的完整描述性句子作为类表示与从BERT模型衍生出的基于简单类别的词嵌入之间的效果差异。此外，还探讨了z-score标准化作为一种后处理步骤对这些嵌入的影响。该提议的方法在多标签广义零样本学习任务上进行了评估，重点关注微创胆囊切除术内窥镜图像中手术器械的识别。

结果： 结果显示，结合句子嵌入和z-score标准化显著提高了模型性能。对于未见过的类别，AUROC从43.9%提升到64.9%，多标签准确率从26.1%提高到79.5%。整体性能在已见与未见类别的综合评估中，AUROC从49.3%提升至64.9%，多标签准确率从37.3%提高到65.1%，突显了我们方法的有效性。

结论： 这些发现表明句子嵌入和z-score标准化可以显著增强零样本学习模型的泛化性能。然而，由于该研究基于单一数据集，未来的工作应在多样化的数据集中验证这种方法，并在不同的应用领域中建立其稳健性和更广泛的应用性。

关键词： 广义零样本学习；多标签分类；句子嵌入；手术器械；z-score标准化。

关键词：零样本学习; 多标签分类; 内窥镜仪器

翻译效果不满意？用Ai改进或寻求AI助手帮助，对摘要进行重点提炼

相关内容

期刊名：International journal of computer assisted radiology and surgery

缩写：INT J COMPUT ASS RAD

ISSN：1861-6410

e-ISSN：1861-6429

IF/分区：2.3/Q3

文章目录更多期刊信息

全文链接

官方链接

PMC全文

引文链接

复制

已复制！

格式：

Enhancing generalization in zero-shot multi-label endoscopic instrument classification

增强零样本多标签内窥镜器械分类的泛化能力翻译改进

Multi-Label Generalized Zero Shot Chest Xray Classification By Combining Image-Text Information With Feature Disentanglement

基于图像文本信息和特征分离的多标签广义零样本胸部X光分类

ZeroNAS: Differentiable Generative Adversarial Networks Search for Zero-Shot Learning

用于零样本学习的可微分生成对抗网络搜索

Grouping attributes zero-shot learning for tongue constitution recognition

基于属性分组的零样本中医舌质辨识方法

PSVMA+: Exploring Multi-Granularity Semantic-Visual Adaption for Generalized Zero-Shot Learning

PSVMA+：探索多粒度语义视觉自适应以实现通用零样本学习

A Deep Multi-modal Explanation Model for Zero-shot Learning

一种用于零样本学习的深度多模态解释模型

Differential Refinement Network for Zero-Shot Learning

用于零样本学习的差分精炼网络

TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

TransZero++：用于零样本学习的跨属性引导变压器

Contrastive Prototype-Guided Generation for Generalized Zero-Shot Learning

基于对比原型的通用零样本学习生成方法

HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing

HealthPrompt：临床自然语言处理的零样本学习范式

Enhancing generalization in zero-shot multi-label endoscopic instrument classification

增强零样本多标签内窥镜器械分类的泛化能力 翻译改进

Multi-Label Generalized Zero Shot Chest Xray Classification By Combining Image-Text Information With Feature Disentanglement

基于图像文本信息和特征分离的多标签广义零样本胸部X光分类

ZeroNAS: Differentiable Generative Adversarial Networks Search for Zero-Shot Learning

用于零样本学习的可微分生成对抗网络搜索

Grouping attributes zero-shot learning for tongue constitution recognition

基于属性分组的零样本中医舌质辨识方法

PSVMA+: Exploring Multi-Granularity Semantic-Visual Adaption for Generalized Zero-Shot Learning

PSVMA+：探索多粒度语义视觉自适应以实现通用零样本学习

A Deep Multi-modal Explanation Model for Zero-shot Learning

一种用于零样本学习的深度多模态解释模型

Differential Refinement Network for Zero-Shot Learning

用于零样本学习的差分精炼网络

TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

TransZero++：用于零样本学习的跨属性引导变压器

Contrastive Prototype-Guided Generation for Generalized Zero-Shot Learning

基于对比原型的通用零样本学习生成方法

HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing

HealthPrompt：临床自然语言处理的零样本学习范式

增强零样本多标签内窥镜器械分类的泛化能力翻译改进