Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks

Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions a... ...

请注册登录后继续浏览