SurgLaVi: Large-scale hierarchical dataset for surgical vision-language representation learning
{{output}}
Vision-language pre-training (VLP) offers unique advantages for surgery by aligning language with surgical videos, enabling workflow understanding and transfer across tasks without relying on expert-labeled datasets. However, progress in surgical VLP remains c... ...