We published a new preprint on “Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime”.
Considering a Variable Projection [1] or two-timescale learning [2] strategy, we show that during training, the distribution of inner weights of two-layer neural networks evolve according to an ultra-fast diffusion equation [3].

References
[1] G. H. Golub, V. Pereyra. The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM Journal on numerical analysis (1973).
[2] P. Marion, R. Berthier. Leveraging the two-timescale regime to demonstrate convergence of neural networks. Advances in Neural Information Processing System (2023).
[3] M. Iacobelli, F. S. Patacchini, F. Santambrogio. Weighted ultrafast diffusion equations: from well-posedness to long-time behaviour. Archive for Rational Mechanics and Analysis (2019).
