Domont, Xavier ; Heckmann, Martin ; Wersing, Heiko ; Joublin, Frank ; Goerick, Christian (2007)
A hierarchical model for syllable recognition.
15th European Symposium on Artificial Neural Networks. Bruges, Belgium (25.04.2007-27.04.2007)
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2007 |
Autor(en): | Domont, Xavier ; Heckmann, Martin ; Wersing, Heiko ; Joublin, Frank ; Goerick, Christian |
Art des Eintrags: | Bibliographie |
Titel: | A hierarchical model for syllable recognition |
Sprache: | Englisch |
Publikationsjahr: | 28 April 2007 |
Buchtitel: | Proceedings of the European Symposium on Artificial Neural Networks 2007 |
Veranstaltungstitel: | 15th European Symposium on Artificial Neural Networks |
Veranstaltungsort: | Bruges, Belgium |
Veranstaltungsdatum: | 25.04.2007-27.04.2007 |
URL / URN: | https://www.esann.org/proceedings/2007 |
Kurzbeschreibung (Abstract): | Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement. |
Fachbereich(e)/-gebiet(e): | 18 Fachbereich Elektrotechnik und Informationstechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Automatisierungstechnik und Mechatronik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Automatisierungstechnik und Mechatronik > Regelungsmethoden und Robotik (ab 01.08.2022 umbenannt in Regelungsmethoden und Intelligente Systeme) |
Hinterlegungsdatum: | 16 Aug 2010 14:31 |
Letzte Änderung: | 18 Apr 2023 13:01 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |