MC#: Mixture Compressor for Mixture-of-Experts Large Models

Mixture-of-Experts (MoE) has emerged as an effective and efficient scaling mechanism for large language models (LLMs) and vision-language models (VLMs). By expanding a single feed-forward network into multiple expert branches, MoE increases model capacity whil... ...

请注册登录后继续浏览