NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
{{output}}
Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capabilities in understanding relationships between visual and textual data through joint embedding spaces. Despite their effectiveness, these models remain vulnerable to adversarial attac... ...