Machkour, Jasin ; Breloy, Arnaud ; Muma, Michael ; Palomar, Daniel P. ; Pascal, Frédéric (2024)
Sparse PCA with False Discovery Rate Controlled Variable Selection.
doi: 10.48550/arXiv.2401.08375
Report, Bibliographie
Kurzbeschreibung (Abstract)
Sparse principal component analysis (PCA) aims at mapping large dimensional data to a linear subspace of lower dimension. By imposing loading vectors to be sparse, it performs the double duty of dimension reduction and variable selection. Sparse PCA algorithms are usually expressed as a trade-off between explained variance and sparsity of the loading vectors (i.e., number of selected variables). As a high explained variance is not necessarily synonymous with relevant information, these methods are prone to select irrelevant variables. To overcome this issue, we propose an alternative formulation of sparse PCA driven by the false discovery rate (FDR). We then leverage the Terminating-Random Experiments (T-Rex) selector to automatically determine an FDR-controlled support of the loading vectors. A major advantage of the resulting T-Rex PCA is that no sparsity parameter tuning is required. Numerical experiments and a stock market data example demonstrate a significant performance improvement.
Typ des Eintrags: | Report |
---|---|
Erschienen: | 2024 |
Autor(en): | Machkour, Jasin ; Breloy, Arnaud ; Muma, Michael ; Palomar, Daniel P. ; Pascal, Frédéric |
Art des Eintrags: | Bibliographie |
Titel: | Sparse PCA with False Discovery Rate Controlled Variable Selection |
Sprache: | Englisch |
Publikationsjahr: | 16 Januar 2024 |
Verlag: | arXiV |
Reihe: | Machine Learning |
Auflage: | 1. Version |
DOI: | 10.48550/arXiv.2401.08375 |
Kurzbeschreibung (Abstract): | Sparse principal component analysis (PCA) aims at mapping large dimensional data to a linear subspace of lower dimension. By imposing loading vectors to be sparse, it performs the double duty of dimension reduction and variable selection. Sparse PCA algorithms are usually expressed as a trade-off between explained variance and sparsity of the loading vectors (i.e., number of selected variables). As a high explained variance is not necessarily synonymous with relevant information, these methods are prone to select irrelevant variables. To overcome this issue, we propose an alternative formulation of sparse PCA driven by the false discovery rate (FDR). We then leverage the Terminating-Random Experiments (T-Rex) selector to automatically determine an FDR-controlled support of the loading vectors. A major advantage of the resulting T-Rex PCA is that no sparsity parameter tuning is required. Numerical experiments and a stock market data example demonstrate a significant performance improvement. |
Zusätzliche Informationen: | Preprint; Published in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), scheduled for 14-19 April 2024 in Seoul, Korea |
Fachbereich(e)/-gebiet(e): | 18 Fachbereich Elektrotechnik und Informationstechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Robust Data Science LOEWE LOEWE > LOEWE-Zentren LOEWE > LOEWE-Zentren > emergenCITY Zentrale Einrichtungen Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ) Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ) > Hochleistungsrechner |
Hinterlegungsdatum: | 03 Apr 2024 11:44 |
Letzte Änderung: | 03 Apr 2024 11:44 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |