NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models

Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capabilities in understanding relationships between visual and textual data through joint embedding spaces. Despite their effectiveness, these models remain vulnerable to adversarial attac... ...

请注册登录后继续浏览