Domont, Xavier ; Heckmann, Martin ; Wersing, Heiko ; Joublin, Frank ; Goerick, Christian (2007)
A hierarchical model for syllable recognition.
15th European Symposium on Artificial Neural Networks. Bruges, Belgium (25.-27.04.2007)
Conference or Workshop Item, Bibliographie
Abstract
Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement.
Item Type: | Conference or Workshop Item |
---|---|
Erschienen: | 2007 |
Creators: | Domont, Xavier ; Heckmann, Martin ; Wersing, Heiko ; Joublin, Frank ; Goerick, Christian |
Type of entry: | Bibliographie |
Title: | A hierarchical model for syllable recognition |
Language: | English |
Date: | 28 April 2007 |
Book Title: | Proceedings of the European Symposium on Artificial Neural Networks 2007 |
Event Title: | 15th European Symposium on Artificial Neural Networks |
Event Location: | Bruges, Belgium |
Event Dates: | 25.-27.04.2007 |
URL / URN: | https://www.esann.org/proceedings/2007 |
Abstract: | Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement. |
Divisions: | 18 Department of Electrical Engineering and Information Technology 18 Department of Electrical Engineering and Information Technology > Institut für Automatisierungstechnik und Mechatronik 18 Department of Electrical Engineering and Information Technology > Institut für Automatisierungstechnik und Mechatronik > Control Methods and Robotics (from 01.08.2022 renamed Control Methods and Intelligent Systems) |
Date Deposited: | 16 Aug 2010 14:31 |
Last Modified: | 18 Apr 2023 13:01 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Send an inquiry |
Options (only for editors)
Show editorial Details |