Möllering, Helen (2023)
Towards Practical Privacy-Preserving Clustering and Health Care Data Analyses.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00024708
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
With the rapid growth in the adoption and democratization of advanced data analysis techniques such as ML and AI, it is imperative to address the privacy concerns arising from the inherent pervasive collection and utilization of sensitive information. In this context, cryptographic secure computation techniques play a crucial role as they enable data analysis while upholding the confidentiality of the input data. These techniques however come with a significant performance overhead compared to plaintext computation. Moreover, some phases of the analysis process, e.g., data preparation, parameter determination, and quality evaluation, have not been adequately considered by the cryptographic community. As a consequence, state-of-the-art secure computation protocols for privacy-preserving data analysis and machine learning nowadays are often not yet ready for being deployed in real-world applications.
In this thesis, we start by investigating the deficiencies of existing secure computation-based solutions for data analyses. Based on our analysis, we take an end-to-end approach to design, implement, and evaluate practical privacy-preserving protocols tailored for specific use cases. We thereby focus on two directions: The first part investigates privacy-preserving protocols for a general class of ML algorithms, namely clustering. The clustering protocols we design combine cryptographic techniques and protocol optimizations with the real-world requirements of plaintext clustering. The second part introduces efficient secure computation protocols for two concrete data analysis applications from the medical domain: The matching of compatible donors and patients for kidney donations and epidemiological modeling. Our solutions critically incorporate interdisciplinary insights.
Practical Privacy-Preserving Clustering. Clustering, an unsupervised ML technique that enables the identification of underlying patterns and structures within data, has a wide range of applications in diverse fields such as health care, marketing, and finance. To ensure the privacy of sensitive input data, numerous secure computation protocols have been proposed for privacy-preserving clustering in recent years. Unfortunately, evaluating their suitability for specific applications and comparing them in terms of privacy guarantees, efficiency, and quality is often complex due to variations in underlying plaintext clustering algorithms, cryptographic techniques, security models, as well as intended participant scenarios and use cases.
We systematize the state-of-the-art in privacy-preserving clustering, introduce criteria on how to assess the suitability of a protocol for a specific use case, and lay out open research questions that need to be solved to make private clustering practical for real-world applications.
Based on the results of our systematization, we introduce the first practical and fully privacy-preserving density-based clustering protocol. In contrast to the K-means clustering algorithm, which is commonly used as a baseline in previous private clustering protocols, our private density-based clustering protocol flexibly determines the number of clusters that suits the input data well and is insensitive to outliers. This makes it much more attractive for real-world applications than a private K-means protocol. One application is the distributed ML training concept, FL, which is susceptible to backdoor attacks manipulating the model in addition to inference attacks extracting information about the training data used. We devise a novel defense mechanism for FL based on our privacy-preserving density based clustering protocol.
This part of the thesis is based on the following three publications: [HMSY21] A. HEGDE, H. MÖLLERING, T. SCHNEIDER, H. YALAME. “SoK: Efficient privacy-preserving clustering”. In: Proceedings on Privacy Enhancing Technologies (PoPETs) 2021.4 (2021). Online: https:/ /ia.cr/2021/809. Code: https://encrypto.de/code/SoK_ppClustering, S. 225–248. CORE Rank A. Appendix A. [BCE+21] B. BOZDEMIR, S. CANARD, O. ERMIS, H. MÖLLERING, M. ÖNEN, T. SCHNEIDER. “Privacy-preserving density-based clustering”. In: ASIA Conference on Computer and Communications Security (ASIACCS). Online: https://ia.cr/2021/612. Code: https://encrypto.de/code/ppDBSCAN. ACM, 2021, S. 658–671. CORE Rank A. Appendix B. [NRC+22] T. D. NGUYEN, P. RIEGER, H. CHEN, H. YALAME, H. MÖLLERING, H. FEREIDOONI, S. MARCHAL, M. MIETTINEN, A. MIRHOSEINI, S. ZEITOUNI, F. KOUSHANFAR, A.- R. SADEGHI, T. SCHNEIDER. “FLAME: Taming backdoors in federated learning”. In: USENIX Security Symposium (USENIX Security). Online: https://ia.cr/2021/025. USENIX, 2022, S. 1415–1432. CORE Rank A*. Appenidx C.
Privacy-Preserving Health Care Data Analysis. A particularly important area for privacy research is health care data analysis due to the highly sensitive nature of medical information and the strong regulatory requirements for data privacy. Secure computation alleviates the trade-off between data usability and privacy by enabling insightful data analyses while still maintaining the privacy and trust of patients, ultimately advancing the quality of care and medical outcomes. We address two important health care data analysis applications in this work. Firstly, we design a novel secure computation protocol addressing the KEP. Secondly, we introduce the problem and first solutions for privacy-preserving epidemiological modeling.
The goal of the KEP is to arrange a series of mutual exchanges between donor-patient pairs. These patients are in need of a kidney donation and unfortunately only have incompatible donors in their own social network. Our secure computation protocol SPIKE finds a locally optimal set of exchange cycles of compatible donors and patients and improves privacy compared to currently deployed centralized approaches. Our approach can reduce legal burdens by keeping patient data locally private and significantly improves efficiency and medical robustness compared to that of the previous state-of-the-art by Breuer et al. (CODASPY'22).
The second highly relevant health care application that we address in this thesis is epidemiological modeling. It predicts the spread of an infectious disease and facilitates the discovery of effective containment strategies. So far, however, epidemiological models often suffer from the lack of precise information about physical contacts, hindering an accurate prediction. We present the RIPPLE framework, in which we formally define privacy-preserving epidemiological modeling. It enables running epidemiological simulations on the up-to-date contact graph of participants without affecting their individual privacy. Additionally, we present two practical instantations with different security and efficiency trade-offs that can run distributed simulations with half a million people in just a few minutes.
This part of the thesis is based on the following publication and technical report: [BHK+22] T. BIRKA, K. HAMACHER, T. KUSSEL, H. MÖLLERING, T. SCHNEIDER. “SPIKE: Secure and private investigation of the kidney exchange problem”. In: BMC Medical Informatics and Decision Making 22.1 (2022). Online: https://arxiv.org/abs/2204.09937. Code: https://github.com/encryptogroup/ppke, S. 253. CORE Rank B. Appendix D. [GHJ+23] D. GÜNTHER, M. HOLZ, B. JUDKEWITZ, H. MÖLLERING, B. PINKAS, T. SCHNEIDER, A. SURESH. “Privacy-preserving epidemiological modeling on mobile graphs”. https://ia.cr/2020/1546. Code: https://zenodo.org/record/6599225. 2023. Appendix E.
Overall, this thesis contributes to make privacy-preserving clustering and secure computation protocols for medical analyses more practical for real-world usage.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2023 | ||||
Autor(en): | Möllering, Helen | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Towards Practical Privacy-Preserving Clustering and Health Care Data Analyses | ||||
Sprache: | Englisch | ||||
Referenten: | Schneider, Prof. Dr. Thomas ; Önen, Prof. PhD Melek | ||||
Publikationsjahr: | 13 Oktober 2023 | ||||
Ort: | Darmstadt | ||||
Kollation: | 224 Seiten in verschiedenen Zählungen | ||||
Datum der mündlichen Prüfung: | 29 September 2023 | ||||
DOI: | 10.26083/tuprints-00024708 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/24708 | ||||
Kurzbeschreibung (Abstract): | With the rapid growth in the adoption and democratization of advanced data analysis techniques such as ML and AI, it is imperative to address the privacy concerns arising from the inherent pervasive collection and utilization of sensitive information. In this context, cryptographic secure computation techniques play a crucial role as they enable data analysis while upholding the confidentiality of the input data. These techniques however come with a significant performance overhead compared to plaintext computation. Moreover, some phases of the analysis process, e.g., data preparation, parameter determination, and quality evaluation, have not been adequately considered by the cryptographic community. As a consequence, state-of-the-art secure computation protocols for privacy-preserving data analysis and machine learning nowadays are often not yet ready for being deployed in real-world applications. In this thesis, we start by investigating the deficiencies of existing secure computation-based solutions for data analyses. Based on our analysis, we take an end-to-end approach to design, implement, and evaluate practical privacy-preserving protocols tailored for specific use cases. We thereby focus on two directions: The first part investigates privacy-preserving protocols for a general class of ML algorithms, namely clustering. The clustering protocols we design combine cryptographic techniques and protocol optimizations with the real-world requirements of plaintext clustering. The second part introduces efficient secure computation protocols for two concrete data analysis applications from the medical domain: The matching of compatible donors and patients for kidney donations and epidemiological modeling. Our solutions critically incorporate interdisciplinary insights. Practical Privacy-Preserving Clustering. Clustering, an unsupervised ML technique that enables the identification of underlying patterns and structures within data, has a wide range of applications in diverse fields such as health care, marketing, and finance. To ensure the privacy of sensitive input data, numerous secure computation protocols have been proposed for privacy-preserving clustering in recent years. Unfortunately, evaluating their suitability for specific applications and comparing them in terms of privacy guarantees, efficiency, and quality is often complex due to variations in underlying plaintext clustering algorithms, cryptographic techniques, security models, as well as intended participant scenarios and use cases. We systematize the state-of-the-art in privacy-preserving clustering, introduce criteria on how to assess the suitability of a protocol for a specific use case, and lay out open research questions that need to be solved to make private clustering practical for real-world applications. Based on the results of our systematization, we introduce the first practical and fully privacy-preserving density-based clustering protocol. In contrast to the K-means clustering algorithm, which is commonly used as a baseline in previous private clustering protocols, our private density-based clustering protocol flexibly determines the number of clusters that suits the input data well and is insensitive to outliers. This makes it much more attractive for real-world applications than a private K-means protocol. One application is the distributed ML training concept, FL, which is susceptible to backdoor attacks manipulating the model in addition to inference attacks extracting information about the training data used. We devise a novel defense mechanism for FL based on our privacy-preserving density based clustering protocol. This part of the thesis is based on the following three publications: [HMSY21] A. HEGDE, H. MÖLLERING, T. SCHNEIDER, H. YALAME. “SoK: Efficient privacy-preserving clustering”. In: Proceedings on Privacy Enhancing Technologies (PoPETs) 2021.4 (2021). Online: https:/ /ia.cr/2021/809. Code: https://encrypto.de/code/SoK_ppClustering, S. 225–248. CORE Rank A. Appendix A. [BCE+21] B. BOZDEMIR, S. CANARD, O. ERMIS, H. MÖLLERING, M. ÖNEN, T. SCHNEIDER. “Privacy-preserving density-based clustering”. In: ASIA Conference on Computer and Communications Security (ASIACCS). Online: https://ia.cr/2021/612. Code: https://encrypto.de/code/ppDBSCAN. ACM, 2021, S. 658–671. CORE Rank A. Appendix B. [NRC+22] T. D. NGUYEN, P. RIEGER, H. CHEN, H. YALAME, H. MÖLLERING, H. FEREIDOONI, S. MARCHAL, M. MIETTINEN, A. MIRHOSEINI, S. ZEITOUNI, F. KOUSHANFAR, A.- R. SADEGHI, T. SCHNEIDER. “FLAME: Taming backdoors in federated learning”. In: USENIX Security Symposium (USENIX Security). Online: https://ia.cr/2021/025. USENIX, 2022, S. 1415–1432. CORE Rank A*. Appenidx C. Privacy-Preserving Health Care Data Analysis. A particularly important area for privacy research is health care data analysis due to the highly sensitive nature of medical information and the strong regulatory requirements for data privacy. Secure computation alleviates the trade-off between data usability and privacy by enabling insightful data analyses while still maintaining the privacy and trust of patients, ultimately advancing the quality of care and medical outcomes. We address two important health care data analysis applications in this work. Firstly, we design a novel secure computation protocol addressing the KEP. Secondly, we introduce the problem and first solutions for privacy-preserving epidemiological modeling. The goal of the KEP is to arrange a series of mutual exchanges between donor-patient pairs. These patients are in need of a kidney donation and unfortunately only have incompatible donors in their own social network. Our secure computation protocol SPIKE finds a locally optimal set of exchange cycles of compatible donors and patients and improves privacy compared to currently deployed centralized approaches. Our approach can reduce legal burdens by keeping patient data locally private and significantly improves efficiency and medical robustness compared to that of the previous state-of-the-art by Breuer et al. (CODASPY'22). The second highly relevant health care application that we address in this thesis is epidemiological modeling. It predicts the spread of an infectious disease and facilitates the discovery of effective containment strategies. So far, however, epidemiological models often suffer from the lack of precise information about physical contacts, hindering an accurate prediction. We present the RIPPLE framework, in which we formally define privacy-preserving epidemiological modeling. It enables running epidemiological simulations on the up-to-date contact graph of participants without affecting their individual privacy. Additionally, we present two practical instantations with different security and efficiency trade-offs that can run distributed simulations with half a million people in just a few minutes. This part of the thesis is based on the following publication and technical report: [BHK+22] T. BIRKA, K. HAMACHER, T. KUSSEL, H. MÖLLERING, T. SCHNEIDER. “SPIKE: Secure and private investigation of the kidney exchange problem”. In: BMC Medical Informatics and Decision Making 22.1 (2022). Online: https://arxiv.org/abs/2204.09937. Code: https://github.com/encryptogroup/ppke, S. 253. CORE Rank B. Appendix D. [GHJ+23] D. GÜNTHER, M. HOLZ, B. JUDKEWITZ, H. MÖLLERING, B. PINKAS, T. SCHNEIDER, A. SURESH. “Privacy-preserving epidemiological modeling on mobile graphs”. https://ia.cr/2020/1546. Code: https://zenodo.org/record/6599225. 2023. Appendix E. Overall, this thesis contributes to make privacy-preserving clustering and secure computation protocols for medical analyses more practical for real-world usage. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-247085 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Praktische Kryptographie und Privatheit |
||||
Hinterlegungsdatum: | 13 Okt 2023 11:40 | ||||
Letzte Änderung: | 16 Okt 2023 13:18 | ||||
PPN: | |||||
Referenten: | Schneider, Prof. Dr. Thomas ; Önen, Prof. PhD Melek | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 29 September 2023 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |