首页 正文

Two-phase perspective on deep learning dynamics

{{output}}
We propose that learning in deep neural networks proceeds in two phases: a rapid curve-fitting phase followed by a slower compression or coarse-graining phase. This view is supported by the shared temporal structure of three phenomena-grokking, double descent,... ...