TU Darmstadt / ULB / TUbiblio

Sparse PCA with False Discovery Rate Controlled Variable Selection

Machkour, Jasin ; Breloy, Arnaud ; Muma, Michael ; Palomar, Daniel P. ; Pascal, Frédéric (2024)
Sparse PCA with False Discovery Rate Controlled Variable Selection.
doi: 10.48550/arXiv.2401.08375
Report, Bibliographie

Kurzbeschreibung (Abstract)

Sparse principal component analysis (PCA) aims at mapping large dimensional data to a linear subspace of lower dimension. By imposing loading vectors to be sparse, it performs the double duty of dimension reduction and variable selection. Sparse PCA algorithms are usually expressed as a trade-off between explained variance and sparsity of the loading vectors (i.e., number of selected variables). As a high explained variance is not necessarily synonymous with relevant information, these methods are prone to select irrelevant variables. To overcome this issue, we propose an alternative formulation of sparse PCA driven by the false discovery rate (FDR). We then leverage the Terminating-Random Experiments (T-Rex) selector to automatically determine an FDR-controlled support of the loading vectors. A major advantage of the resulting T-Rex PCA is that no sparsity parameter tuning is required. Numerical experiments and a stock market data example demonstrate a significant performance improvement.

Typ des Eintrags: Report
Erschienen: 2024
Autor(en): Machkour, Jasin ; Breloy, Arnaud ; Muma, Michael ; Palomar, Daniel P. ; Pascal, Frédéric
Art des Eintrags: Bibliographie
Titel: Sparse PCA with False Discovery Rate Controlled Variable Selection
Sprache: Englisch
Publikationsjahr: 16 Januar 2024
Verlag: arXiV
Reihe: Machine Learning
Auflage: 1. Version
DOI: 10.48550/arXiv.2401.08375
Kurzbeschreibung (Abstract):

Sparse principal component analysis (PCA) aims at mapping large dimensional data to a linear subspace of lower dimension. By imposing loading vectors to be sparse, it performs the double duty of dimension reduction and variable selection. Sparse PCA algorithms are usually expressed as a trade-off between explained variance and sparsity of the loading vectors (i.e., number of selected variables). As a high explained variance is not necessarily synonymous with relevant information, these methods are prone to select irrelevant variables. To overcome this issue, we propose an alternative formulation of sparse PCA driven by the false discovery rate (FDR). We then leverage the Terminating-Random Experiments (T-Rex) selector to automatically determine an FDR-controlled support of the loading vectors. A major advantage of the resulting T-Rex PCA is that no sparsity parameter tuning is required. Numerical experiments and a stock market data example demonstrate a significant performance improvement.

Zusätzliche Informationen:

Preprint; Published in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), scheduled for 14-19 April 2024 in Seoul, Korea

Fachbereich(e)/-gebiet(e): 18 Fachbereich Elektrotechnik und Informationstechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Robust Data Science
LOEWE
LOEWE > LOEWE-Zentren
LOEWE > LOEWE-Zentren > emergenCITY
Zentrale Einrichtungen
Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ)
Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ) > Hochleistungsrechner
Hinterlegungsdatum: 03 Apr 2024 11:44
Letzte Änderung: 03 Apr 2024 11:44
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen