Scheidt, Fabian ; Machkour, Jasin ; Muma, Michael (2023)
Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop.
9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing. Herradura, Costa Rica (10.12.2023 - 13.12.2023)
doi: 10.1109/CAMSAP58249.2023.10403478
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the reproducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy generation strategies based on permutations of a reference matrix. Our numerical experiments demonstrate a drastic reduction in memory demand and computation time. We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes. Our work empowers researchers without access to high-performance clusters to make reproducible discoveries in large-scale high-dimensional data.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2023 |
Autor(en): | Scheidt, Fabian ; Machkour, Jasin ; Muma, Michael |
Art des Eintrags: | Bibliographie |
Titel: | Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop |
Sprache: | Englisch |
Publikationsjahr: | 14 Dezember 2023 |
Ort: | Piscataway, NY |
Verlag: | IEEE |
Buchtitel: | 2023 IEEE 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) |
Veranstaltungstitel: | 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing |
Veranstaltungsort: | Herradura, Costa Rica |
Veranstaltungsdatum: | 10.12.2023 - 13.12.2023 |
DOI: | 10.1109/CAMSAP58249.2023.10403478 |
Kurzbeschreibung (Abstract): | Currently, there is an urgent demand for scalable multivariate and high-dimensional false discovery rate (FDR)-controlling variable selection methods to ensure the reproducibility of discoveries. However, among existing methods, only the recently proposed Terminating-Random Experiments (T-Rex) selector scales to problems with millions of variables, as encountered in, e.g., genomics research. The T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables. In this work, we propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption to enable solving FDR-controlled sparse regression problems with millions of variables on a laptop. We incorporate advanced memory-mapping techniques to work with matrices that reside on solid-state drive and two new dummy generation strategies based on permutations of a reference matrix. Our numerical experiments demonstrate a drastic reduction in memory demand and computation time. We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes. Our work empowers researchers without access to high-performance clusters to make reproducible discoveries in large-scale high-dimensional data. |
Freie Schlagworte: | emergenCITY, emergenCITY_CPS |
Fachbereich(e)/-gebiet(e): | 18 Fachbereich Elektrotechnik und Informationstechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Robust Data Science 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Signalverarbeitung LOEWE LOEWE > LOEWE-Zentren LOEWE > LOEWE-Zentren > emergenCITY |
Hinterlegungsdatum: | 03 Apr 2024 11:38 |
Letzte Änderung: | 09 Dez 2024 12:11 |
PPN: | 520226011 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |