Offre en lien avec l’Action/le Réseau : – — –/– — –
Laboratoire/Entreprise : INRIA Opis
Durée : 5 mois
Contact : laurent.duval@ifpen.fr
Date limite de publication : 2025-04-04
Contexte :
In the context of the ERC MAJORIS, and in collaboration with IFPEN company, the aim of this internship is to investigate the problem of sparse principal component analysis (PCA), with norm-ratio sparsifying penalties. Online information:
https://jobs.inria.fr/public/classic/fr/offres/2024-08488
Sujet :
Principal component analysis (PCA) is a workhorse in linear dimensionality reduction [Jol02]. It is widely applied in exploratory data analysis, visualization, data preprocessing).
Principal components are usually linear combinations of all input variables. For high-dimension data, this may involve input variables that contribute very little to the understanding. Finding the few directions in space that explain best observations is desirable. Sparse PCA overcomes this disadvantage by finding linear combinations that contain just a few input variables, by adding sparsity constraints [CR24,ZX18]. One of such is formulated (cf. lasso) with the help of an absolute norm penalty/regularization. In [MBPS10], one designs this matrix factorization problem as:
minimize_{alpha} || X – D alpha ||^2_F + lambda|| alpha ||_{1,1}
where: X = [x_1,…,x_n] is the matrix of data vectors; D is a square matrix from a suitable basis set, ||.||_F denotes the Frobenius norm; ||.||_{1,1} denotes the sum of the magnitude of matrix coefficients, lambda is a positive penalty weight.
A penalty such as ||.||_{1,1} is 1-homogeneous. This may only weakly emulate the sheer count of non-zero entries of a matrix, that would be scale-invariant or 0-homogeneous.
Recently, the SOOT/SPOQ family of penalties has been developed in our research group, as smooth emulations to the scale-invariant lp/lq norm ratios. The latter had been used for a while, as stopping-criteria, penalties or “continuous” sparsity count estimators [HR09]. They have been used successfully for the restoration/deconvolution/source separation of sparse signals [CCDP20,RPD+15].
The goal of the internship is to investigate the resolution of sparse PCA models, by replacing the standard l1 norm by such norm ratios. Convergence analysis of the proposed optimization algorithm, imlementation and validation over public benchmarks will be conducted.
[CCDP20] Afef Cherni, Emilie Chouzenoux, Laurent Duval, and Jean-Christophe Pesquet. SPOQ ℓp-over-ℓq regularization for sparse signal
recovery applied to mass spectrometry. IEEE Trans. Signal Process., 68:6070–6084, 2020.
[CR24] Fan Chen and Karl Rohe. A new basis for sparse principal component analysis. J. Comp. Graph. Stat.), 33(2):421–434, 2024.
[HR09] N. Hurley and S. Rickard. Comparing measures of sparsity. IEEE Trans. Inform. Theory, 55(10):4723–4741, Oct. 2009.
[Jol02] I. T. Jolliffe. Principal component analysis. Springer Series in Statistics, 2nd edition, 2002.
[MBPS10] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online learning for matrix factorization and sparse coding. J. Mach.
Learn. Res., 11:19–60, 2010.
[RPD+15] A. Repetti, M. Q. Pham, L. Duval, E. Chouzenoux, and J.-C. Pesquet. Euclid in a taxicab: Sparse blind deconvolution with smoothed
ℓ1/ℓ2 regularization. IEEE Signal Process. Lett., 22(5):539–543, May 2015.
[ZCD23] Paul Zheng, Emilie Chouzenoux, and Laurent Duval. PENDANTSS: PEnalized Norm-ratios Disentangling Additive Noise, Trend
and Sparse Spikes. IEEE Signal Process. Lett., 30:215–219, 2023.
[ZX18] Hui Zou and Lingzhou Xue. A selective overview of sparse principal component analysis. Proc. IEEE, 106(8):1311–1320, August
2018.
Profil du candidat :
We seek for a talented candidate in Master 1, Master 2, or Engineering studies
Formation et compétences requises :
A solid background in optimization, and signal processing, and a strong motivation for research and innovation. Experience in Python is necessary.
Adresse d’emploi :
INRIA Saclay
Document attaché : 202501132057_main-IFPEN-INRIA-master-pca-spoq-sparse-revisited.pdf