TU Darmstadt / ULB / TUbiblio

Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network

Berninger, Kim and Hoppe, Jannis and Milde, Benjamin :
Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network.
In: Lecture Notes in Computer Science (LNCS) , 9924 .
[Conference or Workshop Item] , (2016)

Abstract

With the increasing popularity of deep learning approaches in the field of speech recognition and classification many of such problems are encountering a paradigm shift from classic approaches, such as hidden Markov models, to <em class="EmphasisTypeItalic ">recurrent neural networks</em> (RNN). In this paper we are going to examine that transition for the ALC corpus which had been used in the Interspeech 2011 Speaker State Challenge. <em class="EmphasisTypeItalic ">Filter bank</em> (FBANK) features are used alongside two types of bidirectional RNNs, each using <em class="EmphasisTypeItalic ">gated recurrent units</em> (GRU). Those models are used to classify the intoxication state of people just by recordings of their voices and outperform humans with state-of-the-art results.

Item Type: Conference or Workshop Item
Erschienen: 2016
Creators: Berninger, Kim and Hoppe, Jannis and Milde, Benjamin
Title: Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network
Language: German
Abstract:

With the increasing popularity of deep learning approaches in the field of speech recognition and classification many of such problems are encountering a paradigm shift from classic approaches, such as hidden Markov models, to <em class="EmphasisTypeItalic ">recurrent neural networks</em> (RNN). In this paper we are going to examine that transition for the ALC corpus which had been used in the Interspeech 2011 Speaker State Challenge. <em class="EmphasisTypeItalic ">Filter bank</em> (FBANK) features are used alongside two types of bidirectional RNNs, each using <em class="EmphasisTypeItalic ">gated recurrent units</em> (GRU). Those models are used to classify the intoxication state of people just by recordings of their voices and outperform humans with state-of-the-art results.

Title of Book: International Conference on Text, Speech, and Dialogue
Series Name: Lecture Notes in Computer Science (LNCS)
Volume: 9924
Divisions: Department of Computer Science > Telecooperation
Department of Computer Science
Date Deposited: 16 Mar 2017 12:04
DOI: 10.1007/978-3-319-45510-5_50
Identification Number: TUD-CS-2016-14712
Export:

Optionen (nur für Redakteure)

View Item View Item