M.Dagreou, **P.Ablin**, S.Vaiter and T.Moreau. *A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.* Neurips 2022. **Link**

M.Sander, **P.Ablin **and G.Peyré. *Do Residual Neural Networks discretize Neural Ordinary Differential Equations? * Neurips 2022. **Link**

T.Moreau et.al. *Benchopt: Reproducible, efficient and collaborative optimization benchmarks.** * Neurips 2022. **Link**

**P.Ablin **and G.Peyré. *Fast and accurate optimization on the orthogonal manifold without retraction**. * AISTATS 2022. **Link**

M.Sander, **P.Ablin**, M.Blondel and G.Peyré. *Sinkformers: Transformers with doubly stochastic attention**. * AISTATS 2022. **Link**

H.Richard, **P.Ablin**, B.Thirion, A.Gramfort and A.Hyvärinen. *Shared Independent Component Analysis for **Multi-Subject Neuroimaging. * Neurips 2021. **Link**

M.Sander,** P.Ablin**, M.Blondel and G.Peyré. *Momentum residual neural networks. *ICML 2021. ** Link**

A.Korba, PC.Aubin, S.Majewski and **P.Ablin**. *Kernel Stein Discrepancy descent.* Accepted for a **long talk** at ICML 2021. **Link**

**P.Ablin**, JF Cardoso and A.Gramfort. *Spectral independent component analysis with noise modelling for M/EEG source **separation. *Journal of Neuroscience Methods, 2021.** Link**

**P.Ablin**. *Deep orthogonal linear networks are shallow. *Note. Link

Deep linear networks consist in a succession of matrix multiplications, and they are then trained with gradient descent with respect to each factor on a certain loss function. We show that when the factors are orthogonal matrices trained with Riemannian gradient descent, then the network is completely equivalent to a depth-one, shallow network: adding more depth and more parameters does not change anything for the global factorization.

H.Richard, L.Gresele, A.Hyvärinen, B.Thirion, A.Gramfort and **P.Ablin**. *Modeling Shared Responses in Neuroimaging Studies through MultiView ICA. *Accepted for a **spotlight** at Neurips 2020. Link

**P. Ablin,** G.Peyré and T. Moreau. *Super-efficiency of automatic differentiation for functions defined as a minimum.* ICML 2020. Link

How can you compute the gradient of a function defined as a minimum ? There are three usual ways to estimate it from a minimizing sequence: we analyse the convergence speed of these methods, and find a super-efficiency phenomenon for the automatic-differentiation estimator.

D.Sabbagh,** P. Ablin,** G.Varoquaux, A. Gramfort and D.A. Engemann. *Predictive regression modeling with MEG/EEG: from source power to signals and cognitive states.* NeuroImage (2020). Link

D.Sabbagh,** P. Ablin,** G.Varoquaux, A. Gramfort and D.A. Engemann. *Manifold regression to predict from M/EEG signals without source model*. NeurIPS 2019. Link

How can you predict the age of a person from M/EEG signal? The information is first summarized by computing the covariance of the signal, which are positive matrices: we now have to perform regression on the positive matrices manifold. We investigate different Riemannian and non-Riemannian methods to do so, that can cope with rank deficient data.

**P. Ablin,** T.Moreau, M.Massias and A. Gramfort. *Learning step sizes for unfolded sparse coding*. NeurIPS 2019. Link

The ISTA algorithm is one of the most popular algorithm for linear regression with the LASSO. This algorithm can be formally seen as a residual neural network, which can then be trained in order to accelerate its convergence. We show that in order to converge, the last layers of such network can only learn a better step size: the set of parameters is restricted to a one-dimensional manifold.

**P. Ablin,** J.F. Cardoso and A. Gramfort. *Beyond Pham’s algorithm for joint diagonalization*. ESANN 2019. Link** Best student paper award**

The joint diagonalization of a set of matrices consists in finding a joint linear transform of the matrices which yields matrices as diagonal as possible. Pham’s algorithm is the most popular (and one of the only) algorithm for joint diagonalization of a set of positive definite matrices. We propose a fast quasi-Newton method, with updates as costly as a gradient descent, which converge at quadratic speed when the set of matrice is jointly diagonalizable. The algorithm is ~ 10 times faster than Pham’s algorithm.

**P. Ablin, **D. Fagot, H. Wendt, A. Gramfort and C. Févotte. *A Quasi-Newton algorithm on the orthogonal manifold for NMF with transform learning. *ICASSP 2019. Link

**P. Ablin, **A. Gramfort, J.F. Cardoso and F. Bach. *Stochastic algorithms with descent guarantees for ICA. *AISTATS 2019. Link **Accepted for oral presentation**

We propose the first ICA algorithm which is stochastic, i.e. process only a mini-batch of samples at a time, and has some descent guarantees: a surrogate loss function is guaranteed to decrease after each mini-batch is processed.

**P. Ablin,** J.F. Cardoso and A. Gramfort. *Accelerating Likelihood Optimization for ICA on Real Signals*. LVA-ICA ’18. Link** Best student paper award**

**P. Ablin,** J.F. Cardoso and A. Gramfort. *Faster ICA under orthogonal constraint. * ICASSP’18. Link

We extend the Picard algorithm to orthogonal contraint. The algorithm is a preconditioned L-BFGS on the orthogonal manifold, using Hessian approximations which are cheap, and exact when the unmixed signals are independent. The algorithm is faster than FastICA when applied on real datasets.

**P. Ablin,** J.F. Cardoso and A. Gramfort. *Faster independent component analysis by preconditioning with Hessian approximations**. * IEEE-Transactions on Signal Processing (2017). Link

We propose the Picard algorithm for fast ICA, with a focus on practical applications and real data. The algorithm is a quasi-Newton method which has two stages: first, some cheap Hessian approximation is computed. This approximation matches the true Hessian when the unmixed signals are perfectly independent. Perfect independence will never occur in practical case, therefore we refine this Hessian using the L-BFGS algorithm. The algorithm is very fast on real datasets.

**P. Ablin **and** **K. Siddiqi. *Detecting Myocardial Infarction Using Medial Surfaces. *International Workshop on Statistical Atlases and Computational Models of the Heart (2015).