TU Darmstadt / ULB / TUbiblio

A hierarchical model for syllable recognition

Domont, Xavier ; Heckmann, Martin ; Wersing, Heiko ; Joublin, Frank ; Goerick, Christian (2007)
A hierarchical model for syllable recognition.
15th European Symposium on Artificial Neural Networks. Bruges, Belgium (25.-27.04.2007)
Conference or Workshop Item, Bibliographie

Abstract

Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement.

Item Type: Conference or Workshop Item
Erschienen: 2007
Creators: Domont, Xavier ; Heckmann, Martin ; Wersing, Heiko ; Joublin, Frank ; Goerick, Christian
Type of entry: Bibliographie
Title: A hierarchical model for syllable recognition
Language: English
Date: 28 April 2007
Book Title: Proceedings of the European Symposium on Artificial Neural Networks 2007
Event Title: 15th European Symposium on Artificial Neural Networks
Event Location: Bruges, Belgium
Event Dates: 25.-27.04.2007
URL / URN: https://www.esann.org/proceedings/2007
Abstract:

Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement.

Divisions: 18 Department of Electrical Engineering and Information Technology
18 Department of Electrical Engineering and Information Technology > Institut für Automatisierungstechnik und Mechatronik
18 Department of Electrical Engineering and Information Technology > Institut für Automatisierungstechnik und Mechatronik > Control Methods and Robotics (from 01.08.2022 renamed Control Methods and Intelligent Systems)
Date Deposited: 16 Aug 2010 14:31
Last Modified: 18 Apr 2023 13:01
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details