TU Darmstadt / ULB / TUbiblio

High-Dimensional False Discovery Rate Control for Dependent Variables

Machkour, Jasin ; Muma, Michael ; Palomar, Daniel P. (2024)
High-Dimensional False Discovery Rate Control for Dependent Variables.
doi: 10.48550/arXiv.2401.15796
Report, Bibliographie

Kurzbeschreibung (Abstract)

Algorithms that ensure reproducible findings from large-scale, high-dimensional data are pivotal in numerous signal processing applications. In recent years, multivariate false discovery rate (FDR) controlling methods have emerged, providing guarantees even in high-dimensional settings where the number of variables surpasses the number of samples. However, these methods often fail to reliably control the FDR in the presence of highly dependent variable groups, a common characteristic in fields such as genomics and finance. To tackle this critical issue, we introduce a novel framework that accounts for general dependency structures. Our proposed dependency-aware T-Rex selector integrates hierarchical graphical models within the T-Rex framework to effectively harness the dependency structure among variables. Leveraging martingale theory, we prove that our variable penalization mechanism ensures FDR control. We further generalize the FDR-controlling framework by stating and proving a clear condition necessary for designing both graphical and non-graphical models that capture dependencies. Additionally, we formulate a fully integrated optimal calibration algorithm that concurrently determines the parameters of the graphical model and the T-Rex framework, such that the FDR is controlled while maximizing the number of selected variables. Numerical experiments and a breast cancer survival analysis use-case demonstrate that the proposed method is the only one among the state-of-the-art benchmark methods that controls the FDR and reliably detects genes that have been previously identified to be related to breast cancer. An open-source implementation is available within the R package TRexSelector on CRAN.

Typ des Eintrags: Report
Erschienen: 2024
Autor(en): Machkour, Jasin ; Muma, Michael ; Palomar, Daniel P.
Art des Eintrags: Bibliographie
Titel: High-Dimensional False Discovery Rate Control for Dependent Variables
Sprache: Englisch
Publikationsjahr: 30 Januar 2024
Verlag: arXiV
Reihe: Methodology
Auflage: 2. Version
DOI: 10.48550/arXiv.2401.15796
Kurzbeschreibung (Abstract):

Algorithms that ensure reproducible findings from large-scale, high-dimensional data are pivotal in numerous signal processing applications. In recent years, multivariate false discovery rate (FDR) controlling methods have emerged, providing guarantees even in high-dimensional settings where the number of variables surpasses the number of samples. However, these methods often fail to reliably control the FDR in the presence of highly dependent variable groups, a common characteristic in fields such as genomics and finance. To tackle this critical issue, we introduce a novel framework that accounts for general dependency structures. Our proposed dependency-aware T-Rex selector integrates hierarchical graphical models within the T-Rex framework to effectively harness the dependency structure among variables. Leveraging martingale theory, we prove that our variable penalization mechanism ensures FDR control. We further generalize the FDR-controlling framework by stating and proving a clear condition necessary for designing both graphical and non-graphical models that capture dependencies. Additionally, we formulate a fully integrated optimal calibration algorithm that concurrently determines the parameters of the graphical model and the T-Rex framework, such that the FDR is controlled while maximizing the number of selected variables. Numerical experiments and a breast cancer survival analysis use-case demonstrate that the proposed method is the only one among the state-of-the-art benchmark methods that controls the FDR and reliably detects genes that have been previously identified to be related to breast cancer. An open-source implementation is available within the R package TRexSelector on CRAN.

Fachbereich(e)/-gebiet(e): 18 Fachbereich Elektrotechnik und Informationstechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Robust Data Science
LOEWE
LOEWE > LOEWE-Zentren
LOEWE > LOEWE-Zentren > emergenCITY
Zentrale Einrichtungen
Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ)
Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ) > Hochleistungsrechner
Hinterlegungsdatum: 03 Apr 2024 11:41
Letzte Änderung: 03 Apr 2024 11:45
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen