TU Darmstadt / ULB / TUbiblio

Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop

Scheidt, Fabian ; Machkour, Jasin ; Muma, Michael (2023)
Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop.
9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing. Herradura, Costa Rica (10.12.2023 - 13.12.2023)
doi: 10.1109/CAMSAP58249.2023.10403478
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the reproducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy generation strategies based on permutations of a reference matrix. Our numerical experiments demonstrate a drastic reduction in memory demand and computation time. We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes. Our work empowers researchers without access to high-performance clusters to make reproducible discoveries in large-scale high-dimensional data.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Autor(en): Scheidt, Fabian ; Machkour, Jasin ; Muma, Michael
Art des Eintrags: Bibliographie
Titel: Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop
Sprache: Englisch
Publikationsjahr: 14 Dezember 2023
Ort: Piscataway, NY
Verlag: IEEE
Buchtitel: 2023 IEEE 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
Veranstaltungstitel: 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing
Veranstaltungsort: Herradura, Costa Rica
Veranstaltungsdatum: 10.12.2023 - 13.12.2023
DOI: 10.1109/CAMSAP58249.2023.10403478
Kurzbeschreibung (Abstract):

Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the reproducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy generation strategies based on permutations of a reference matrix. Our numerical experiments demonstrate a drastic reduction in memory demand and computation time. We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes. Our work empowers researchers without access to high-performance clusters to make reproducible discoveries in large-scale high-dimensional data.

Fachbereich(e)/-gebiet(e): 18 Fachbereich Elektrotechnik und Informationstechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Robust Data Science
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Signalverarbeitung
LOEWE
LOEWE > LOEWE-Zentren
LOEWE > LOEWE-Zentren > emergenCITY
Hinterlegungsdatum: 03 Apr 2024 11:38
Letzte Änderung: 30 Jul 2024 14:48
PPN: 520226011
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen