首页 正文

Individual & Common Attack: Enhancing Transferability in VLP Models through Modal Feature Exploitation

{{output}}
Vision-Language Pretrained (VLP) models exhibit strong multimodal understanding and reasoning capabilities, finding wide application in tasks such as image-text retrieval and visual grounding. However, they remain highly vulnerable to adversarial attacks, posi... ...