Individual & Common Attack: Enhancing Transferability in VLP Models through Modal Feature Exploitation

Vision-Language Pretrained (VLP) models exhibit strong multimodal understanding and reasoning capabilities, finding wide application in tasks such as image-text retrieval and visual grounding. However, they remain highly vulnerable to adversarial attacks, posi... ...

请注册登录后继续浏览