Keul, Frank ; Hess, Martin ; Goesele, Michael ; Hamacher, Kay (2017)
PFASUM: a substitution matrix from Pfam structural alignments.
In: BMC Bioinformatics, 2017, 18 (1)
Artikel, Zweitveröffentlichung, Verlagsversion
Es ist eine neuere Version dieses Eintrags verfügbar. |
Kurzbeschreibung (Abstract)
Background
Detecting homologous protein sequences and computing multiple sequence alignments (MSA) are fundamental tasks in molecular bioinformatics. These tasks usually require a substitution matrix for modeling evolutionary substitution events derived from a set of aligned sequences. Over the last years, the known sequence space increased drastically and several publications demonstrated that this can lead to significantly better performing matrices. Interestingly, matrices based on dated sequence datasets are still the de facto standard for both tasks even though their data basis may limit their capabilities.
We address these aspects by presenting a new substitution matrix series called PFASUM. These matrices are derived from Pfam seed MSAs using a novel algorithm and thus build upon expert ground truth data covering a large and diverse sequence space. Results
We show results for two use cases: First, we tested the homology search performance of PFASUM matrices on up-to-date ASTRAL databases with varying sequence similarity. Our study shows that the usage of PFASUM matrices can lead to significantly better homology search results when compared to conventional matrices. PFASUM matrices with comparable relative entropies to the commonly used substitution matrices BLOSUM50, BLOSUM62, PAM250, VTML160 and VTML200 outperformed their corresponding counterparts in 93% of all test cases. A general assessment also comparing matrices with different relative entropies showed that PFASUM matrices delivered the best homology search performance in the test set.
Second, our results demonstrate that the usage of PFASUM matrices for MSA construction improves their quality when compared to conventional matrices. On up-to-date MSA benchmarks, at least 60% of all MSAs were reconstructed in an equal or higher quality when using MUSCLE with PFASUM31, PFASUM43 and PFASUM60 matrices instead of conventional matrices. This rate even increases to at least 76% for MSAs containing similar sequences.
Conclusions
We present the novel PFASUM substitution matrices derived from manually curated MSA ground truth data covering the currently known sequence space. Our results imply that PFASUM matrices improve homology search performance as well as MSA quality in many cases when compared to conventional substitution matrices. Hence, we encourage the usage of PFASUM matrices and especially PFASUM60 for these specific tasks.
Typ des Eintrags: | Artikel |
---|---|
Erschienen: | 2017 |
Autor(en): | Keul, Frank ; Hess, Martin ; Goesele, Michael ; Hamacher, Kay |
Art des Eintrags: | Zweitveröffentlichung |
Titel: | PFASUM: a substitution matrix from Pfam structural alignments |
Sprache: | Englisch |
Publikationsjahr: | 5 Juni 2017 |
Ort: | Darmstadt |
Publikationsdatum der Erstveröffentlichung: | 2017 |
Verlag: | Biomed Central |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | BMC Bioinformatics |
Jahrgang/Volume einer Zeitschrift: | 18 |
(Heft-)Nummer: | 1 |
URL / URN: | http://tuprints.ulb.tu-darmstadt.de/6510/ |
Zugehörige Links: | |
Herkunft: | Zweitveröffentlichung aus gefördertem Golden Open Access |
Kurzbeschreibung (Abstract): | Background Detecting homologous protein sequences and computing multiple sequence alignments (MSA) are fundamental tasks in molecular bioinformatics. These tasks usually require a substitution matrix for modeling evolutionary substitution events derived from a set of aligned sequences. Over the last years, the known sequence space increased drastically and several publications demonstrated that this can lead to significantly better performing matrices. Interestingly, matrices based on dated sequence datasets are still the de facto standard for both tasks even though their data basis may limit their capabilities. We address these aspects by presenting a new substitution matrix series called PFASUM. These matrices are derived from Pfam seed MSAs using a novel algorithm and thus build upon expert ground truth data covering a large and diverse sequence space. Results We show results for two use cases: First, we tested the homology search performance of PFASUM matrices on up-to-date ASTRAL databases with varying sequence similarity. Our study shows that the usage of PFASUM matrices can lead to significantly better homology search results when compared to conventional matrices. PFASUM matrices with comparable relative entropies to the commonly used substitution matrices BLOSUM50, BLOSUM62, PAM250, VTML160 and VTML200 outperformed their corresponding counterparts in 93% of all test cases. A general assessment also comparing matrices with different relative entropies showed that PFASUM matrices delivered the best homology search performance in the test set. Second, our results demonstrate that the usage of PFASUM matrices for MSA construction improves their quality when compared to conventional matrices. On up-to-date MSA benchmarks, at least 60% of all MSAs were reconstructed in an equal or higher quality when using MUSCLE with PFASUM31, PFASUM43 and PFASUM60 matrices instead of conventional matrices. This rate even increases to at least 76% for MSAs containing similar sequences. Conclusions We present the novel PFASUM substitution matrices derived from manually curated MSA ground truth data covering the currently known sequence space. Our results imply that PFASUM matrices improve homology search performance as well as MSA quality in many cases when compared to conventional substitution matrices. Hence, we encourage the usage of PFASUM matrices and especially PFASUM60 for these specific tasks. |
Status: | Verlagsversion |
URN: | urn:nbn:de:tuda-tuprints-65108 |
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 500 Naturwissenschaften und Mathematik > 570 Biowissenschaften, Biologie |
Fachbereich(e)/-gebiet(e): | 10 Fachbereich Biologie 10 Fachbereich Biologie > Computational Biology and Simulation 20 Fachbereich Informatik 20 Fachbereich Informatik > Graphics, Capture and Massively Parallel Computing |
Hinterlegungsdatum: | 01 Okt 2017 19:55 |
Letzte Änderung: | 05 Jan 2024 10:09 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Verfügbare Versionen dieses Eintrags
- PFASUM: a substitution matrix from Pfam structural alignments. (deposited 01 Okt 2017 19:55) [Gegenwärtig angezeigt]
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |