首页 正文

LoCo: Low-Bit Communication Adaptor for Large-Scale Model Training

{{output}}
To efficiently train large-scale models, low-bit gradient communication compresses full-precision gradients on local GPU nodes into low-precision ones for higher gradient synchronization efficiency among GPU nodes. However, it often degrades training quality d... ...