首页 正文

Geometric dynamics of signal propagation predict trainability of transformers

{{output}}
We investigate forward signal propagation and gradient back propagation in deep, randomly initialized transformers, and we use our analysis to propose simple and geometrically meaningful criteria for hyperparameter initialization that ensures trainability of d... ...