Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Large Language Models (LLMs) have demonstrated remarkable success across diverse applications, yet their susceptibility to malicious exploitation remains a critical challenge. Notably, LLMs are known to be vulnerable to jailbreaking attacks, where adversaries ... ...

请注册登录后继续浏览