TU Darmstadt / ULB / TUbiblio

Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network

Berninger, Kim ; Hoppe, Jannis ; Milde, Benjamin (2016)
Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network.
doi: 10.1007/978-3-319-45510-5_50
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

With the increasing popularity of deep learning approaches in the field of speech recognition and classification many of such problems are encountering a paradigm shift from classic approaches, such as hidden Markov models, to <em class="EmphasisTypeItalic ">recurrent neural networks</em> (RNN). In this paper we are going to examine that transition for the ALC corpus which had been used in the Interspeech 2011 Speaker State Challenge. <em class="EmphasisTypeItalic ">Filter bank</em> (FBANK) features are used alongside two types of bidirectional RNNs, each using <em class="EmphasisTypeItalic ">gated recurrent units</em> (GRU). Those models are used to classify the intoxication state of people just by recordings of their voices and outperform humans with state-of-the-art results.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2016
Autor(en): Berninger, Kim ; Hoppe, Jannis ; Milde, Benjamin
Art des Eintrags: Bibliographie
Titel: Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network
Sprache: Deutsch
Publikationsjahr: September 2016
Buchtitel: International Conference on Text, Speech, and Dialogue
Reihe: Lecture Notes in Computer Science (LNCS)
Band einer Reihe: 9924
DOI: 10.1007/978-3-319-45510-5_50
Kurzbeschreibung (Abstract):

With the increasing popularity of deep learning approaches in the field of speech recognition and classification many of such problems are encountering a paradigm shift from classic approaches, such as hidden Markov models, to <em class="EmphasisTypeItalic ">recurrent neural networks</em> (RNN). In this paper we are going to examine that transition for the ALC corpus which had been used in the Interspeech 2011 Speaker State Challenge. <em class="EmphasisTypeItalic ">Filter bank</em> (FBANK) features are used alongside two types of bidirectional RNNs, each using <em class="EmphasisTypeItalic ">gated recurrent units</em> (GRU). Those models are used to classify the intoxication state of people just by recordings of their voices and outperform humans with state-of-the-art results.

ID-Nummer: TUD-CS-2016-14712
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik > Telekooperation
20 Fachbereich Informatik
Hinterlegungsdatum: 16 Mär 2017 12:04
Letzte Änderung: 15 Mai 2018 12:01
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen