Kussel, Tobias (2022)
Graph Structures in Privacy-Preserving Biomedical Analyses.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00021828
Ph.D. Thesis, Primary publication, Publisher's Version
Abstract
Graph theory is one of the truly interdisciplinary fields of research. Not only does the application of graphs lead to new insights in a variety of disciplines — physics, biology, sociology, mathematics, and computer science — but all of these disciplines, with their different questions and perspectives, actively influence the development and research of graph theory itself. Being able to abstract interconnected and interdependent systems and describe them in a mathematically exact way offers a different way of describing the problem when facing a diverse array of questions. It favors finding certain “topology-based” approaches while simultaneously providing a methodological toolbox; especially statistical physics offers numerous examples for this.
In this thesis, I consider various research questions in the field of distributed biomedical analysis. In this context, the health data used pose a significant risk to the privacy of the individuals involved — still, many conditions are socially stigmatized, so their disclosure could lead to ostracization, occupational disadvantages, and even physical harm. This is reflected in legislation, which requires special protection for health data — and other data that carry the risk of identifying an individual.
In this dissertation, I present the design, implementation, and empirical analysis of methods and algorithms that are not only relevant for obtaining medical knowledge and developing new treatment methods, but at the same time provide an exceedingly high level of protection concerning sensitive data. The developed methods presented hereafter use the cryptographic techniques of Secure Multi-Party Computation (MPC), a class of cryptographic protocols that enables the joint computation over distributed datasets while simultaneously providing highest security guarantees regarding the secrecy of the (distributed) input data. Graphs play a central role in this context: the desired functionalities are mapped to Boolean or arithmetic Directed Acyclic Graphs (DAGs), where the nodes represent operations or interactive protocols.
However, not only do the methodology and techniques employed in this dissertation make use of approaches from statistical physics and graph theory — many of the medical questions directly relate to graph problems or lead to more efficient solutions when interpreted as graphs. Specifically, a method is developed to efficiently analyze epistasis relationships, i.e., non-linear gene-gene and gene-environment interactions, and the resulting gene regulatory networks in a privacy-preserving manner. To enable the processing of the required amounts of genetic data within feasible time scales, novel generally applicable MPC building blocks were developed. Furthermore, the graph optimization problem of kidney exchanges — the efficient and fair allocation of donors’ kidneys for kidney transplantation — is explored. Since sensitive medical data are involved as well, these data protection considerations are of central interest here. In addition to the many boundary conditions to this problem, for example regarding the robustness of the solutions or the medical compatibility of organ donors and recipients, finding the global solution has been proven to be a problem with superpolynomial complexity. Hence, a locally optimal algorithm to solve this issue is presented in this work. Finally, the problem of probabilistic record linkage, which is highly relevant in medical research, is considered. Based on the personal data such as first name, last name, address, and birthday, the similarity between patients in one or more databases is to be assessed, with the goal of identifying duplicates even in the presence of noisy data, i.e., misspellings and interchanged identifiers. Via the path of graph representation and the associated search for approximated subgraph isomorphisms, record linkage is reduced to a (simpler) database problem with efficiently solvable algorithms. The developed framework Mainzelliste Secure EpiLinker (MainSEL) provides a means for secure, privacy-preserving probabilistic record linkage between different medical institutions. However, it is also capable of comparing other data types for their respective similarity. At the end of this dissertation, three promising but not yet extensively explored extensions are presented that allow the matching of pharmacologically relevant small molecule graphs, the search for similar patients in clinical databases for nonspecific diagnoses, or the identification of disrupted biological regulatory pathways by comparing transcriptome networks.
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Erschienen: | 2022 | ||||
Creators: | Kussel, Tobias | ||||
Type of entry: | Primary publication | ||||
Title: | Graph Structures in Privacy-Preserving Biomedical Analyses | ||||
Language: | English | ||||
Referees: | Hamacher, Prof. Dr. Kay ; Schneider, Prof. Dr. Thomas | ||||
Date: | 2022 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | V, 166 Seiten | ||||
Refereed: | 20 July 2022 | ||||
DOI: | 10.26083/tuprints-00021828 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/21828 | ||||
Abstract: | Graph theory is one of the truly interdisciplinary fields of research. Not only does the application of graphs lead to new insights in a variety of disciplines — physics, biology, sociology, mathematics, and computer science — but all of these disciplines, with their different questions and perspectives, actively influence the development and research of graph theory itself. Being able to abstract interconnected and interdependent systems and describe them in a mathematically exact way offers a different way of describing the problem when facing a diverse array of questions. It favors finding certain “topology-based” approaches while simultaneously providing a methodological toolbox; especially statistical physics offers numerous examples for this. In this thesis, I consider various research questions in the field of distributed biomedical analysis. In this context, the health data used pose a significant risk to the privacy of the individuals involved — still, many conditions are socially stigmatized, so their disclosure could lead to ostracization, occupational disadvantages, and even physical harm. This is reflected in legislation, which requires special protection for health data — and other data that carry the risk of identifying an individual. In this dissertation, I present the design, implementation, and empirical analysis of methods and algorithms that are not only relevant for obtaining medical knowledge and developing new treatment methods, but at the same time provide an exceedingly high level of protection concerning sensitive data. The developed methods presented hereafter use the cryptographic techniques of Secure Multi-Party Computation (MPC), a class of cryptographic protocols that enables the joint computation over distributed datasets while simultaneously providing highest security guarantees regarding the secrecy of the (distributed) input data. Graphs play a central role in this context: the desired functionalities are mapped to Boolean or arithmetic Directed Acyclic Graphs (DAGs), where the nodes represent operations or interactive protocols. However, not only do the methodology and techniques employed in this dissertation make use of approaches from statistical physics and graph theory — many of the medical questions directly relate to graph problems or lead to more efficient solutions when interpreted as graphs. Specifically, a method is developed to efficiently analyze epistasis relationships, i.e., non-linear gene-gene and gene-environment interactions, and the resulting gene regulatory networks in a privacy-preserving manner. To enable the processing of the required amounts of genetic data within feasible time scales, novel generally applicable MPC building blocks were developed. Furthermore, the graph optimization problem of kidney exchanges — the efficient and fair allocation of donors’ kidneys for kidney transplantation — is explored. Since sensitive medical data are involved as well, these data protection considerations are of central interest here. In addition to the many boundary conditions to this problem, for example regarding the robustness of the solutions or the medical compatibility of organ donors and recipients, finding the global solution has been proven to be a problem with superpolynomial complexity. Hence, a locally optimal algorithm to solve this issue is presented in this work. Finally, the problem of probabilistic record linkage, which is highly relevant in medical research, is considered. Based on the personal data such as first name, last name, address, and birthday, the similarity between patients in one or more databases is to be assessed, with the goal of identifying duplicates even in the presence of noisy data, i.e., misspellings and interchanged identifiers. Via the path of graph representation and the associated search for approximated subgraph isomorphisms, record linkage is reduced to a (simpler) database problem with efficiently solvable algorithms. The developed framework Mainzelliste Secure EpiLinker (MainSEL) provides a means for secure, privacy-preserving probabilistic record linkage between different medical institutions. However, it is also capable of comparing other data types for their respective similarity. At the end of this dissertation, three promising but not yet extensively explored extensions are presented that allow the matching of pharmacologically relevant small molecule graphs, the search for similar patients in clinical databases for nonspecific diagnoses, or the identification of disrupted biological regulatory pathways by comparing transcriptome networks. |
||||
Alternative Abstract: |
|
||||
Uncontrolled Keywords: | Multi-Party Computation, Graphtheorie, Medizin, IT Sicherheit | ||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-218285 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 500 Science and mathematics > 500 Science 500 Science and mathematics > 530 Physics 500 Science and mathematics > 570 Life sciences, biology 600 Technology, medicine, applied sciences > 610 Medicine and health |
||||
Divisions: | 10 Department of Biology 10 Department of Biology > Computational Biology and Simulation |
||||
TU-Projects: | Bund/BMBF|01ZZ1802G|HIGHmed DLR|01ZZ1911F|CORD_MI |
||||
Date Deposited: | 10 Aug 2022 12:09 | ||||
Last Modified: | 11 Aug 2022 06:03 | ||||
PPN: | |||||
Referees: | Hamacher, Prof. Dr. Kay ; Schneider, Prof. Dr. Thomas | ||||
Refereed / Verteidigung / mdl. Prüfung: | 20 July 2022 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Send an inquiry |
Options (only for editors)
Show editorial Details |