[Small bowel video keyframe retrieval based on multi-modal contrastive learning]
{{output}}
Retrieving keyframes most relevant to text from small intestine videos with given labels can efficiently and accurately locate pathological regions. However, training directly on raw video data is extremely slow, while learning visual representations from imag... ...