TU Darmstadt / ULB / TUbiblio

Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge

Tarca, A. L. ; Lauria, M. ; Unger, M. ; Bilal, E. ; Boue, S. ; Kumar Dey, K. ; Hoeng, J. ; Koeppl, H. ; Martin, F. ; Meyer, P. ; Nandy, P. ; Norel, R. ; Peitsch, M. ; Rice, J. ; Romero, R. ; Stolovitzky, G. ; Talikka, M. ; Xiang, Y. ; Zechner, C. (2013)
Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge.
In: Bioinformatics (Oxford, England), 29 (22)
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

MOTIVATION: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. RESULTS: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. AVAILABILITY: The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.

Typ des Eintrags: Artikel
Erschienen: 2013
Autor(en): Tarca, A. L. ; Lauria, M. ; Unger, M. ; Bilal, E. ; Boue, S. ; Kumar Dey, K. ; Hoeng, J. ; Koeppl, H. ; Martin, F. ; Meyer, P. ; Nandy, P. ; Norel, R. ; Peitsch, M. ; Rice, J. ; Romero, R. ; Stolovitzky, G. ; Talikka, M. ; Xiang, Y. ; Zechner, C.
Art des Eintrags: Bibliographie
Titel: Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge
Sprache: Deutsch
Publikationsjahr: November 2013
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Bioinformatics (Oxford, England)
Jahrgang/Volume einer Zeitschrift: 29
(Heft-)Nummer: 22
URL / URN: http://www.ncbi.nlm.nih.gov/pubmed/23966112
Kurzbeschreibung (Abstract):

MOTIVATION: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. RESULTS: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. AVAILABILITY: The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.

Fachbereich(e)/-gebiet(e): 18 Fachbereich Elektrotechnik und Informationstechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Bioinspirierte Kommunikationssysteme
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik
Hinterlegungsdatum: 04 Apr 2014 11:41
Letzte Änderung: 24 Jul 2023 12:52
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen