LoCo: Low-Bit Communication Adaptor for Large-Scale Model Training
{{output}}
To efficiently train large-scale models, low-bit gradient communication compresses full-precision gradients on local GPU nodes into low-precision ones for higher gradient synchronization efficiency among GPU nodes. However, it often degrades training quality d... ...