首页 正文

Adaptive-expert-weight-based load balance scheme for dynamic routing of MoE

{{output}}
Load imbalance is a major performance bottleneck in training mixture-of-experts (MoE) models, as unbalanced expert loads can lead to routing collapse. Most existing approaches address this issue by introducing auxiliary loss functions to balance the load; howe... ...