首页 正文

Noise balance and stationary distribution of stochastic gradient descent

{{output}}
The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we show that the... ...