Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks
{{output}}
Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions a... ...