RoMo: Robust Unsupervised Multimodal Learning With Noisy Pseudo Labels
{{output}}
The rise of the metaverse and the increasing volume of heterogeneous 2D and 3D data have created a growing demand for cross-modal retrieval, enabling users to query semantically relevant data across different modalities. Existing methods heavily rely on class ... ...