We published a new preprint on “Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime”.
Considering a Variable Projection or two-timescale learning strategy, we show that during training, the distribution of inner weights of two-layer neural networks evolve according to an ultra-fast diffusion equation.