Berninger, Kim ; Hoppe, Jannis ; Milde, Benjamin (2016)
Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network.
doi: 10.1007/978-3-319-45510-5_50
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
With the increasing popularity of deep learning approaches in the field of speech recognition and classification many of such problems are encountering a paradigm shift from classic approaches, such as hidden Markov models, to <em class="EmphasisTypeItalic ">recurrent neural networks</em> (RNN). In this paper we are going to examine that transition for the ALC corpus which had been used in the Interspeech 2011 Speaker State Challenge. <em class="EmphasisTypeItalic ">Filter bank</em> (FBANK) features are used alongside two types of bidirectional RNNs, each using <em class="EmphasisTypeItalic ">gated recurrent units</em> (GRU). Those models are used to classify the intoxication state of people just by recordings of their voices and outperform humans with state-of-the-art results.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2016 |
Autor(en): | Berninger, Kim ; Hoppe, Jannis ; Milde, Benjamin |
Art des Eintrags: | Bibliographie |
Titel: | Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network |
Sprache: | Deutsch |
Publikationsjahr: | September 2016 |
Buchtitel: | International Conference on Text, Speech, and Dialogue |
Reihe: | Lecture Notes in Computer Science (LNCS) |
Band einer Reihe: | 9924 |
DOI: | 10.1007/978-3-319-45510-5_50 |
Kurzbeschreibung (Abstract): | With the increasing popularity of deep learning approaches in the field of speech recognition and classification many of such problems are encountering a paradigm shift from classic approaches, such as hidden Markov models, to <em class="EmphasisTypeItalic ">recurrent neural networks</em> (RNN). In this paper we are going to examine that transition for the ALC corpus which had been used in the Interspeech 2011 Speaker State Challenge. <em class="EmphasisTypeItalic ">Filter bank</em> (FBANK) features are used alongside two types of bidirectional RNNs, each using <em class="EmphasisTypeItalic ">gated recurrent units</em> (GRU). Those models are used to classify the intoxication state of people just by recordings of their voices and outperform humans with state-of-the-art results. |
ID-Nummer: | TUD-CS-2016-14712 |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik > Telekooperation 20 Fachbereich Informatik |
Hinterlegungsdatum: | 16 Mär 2017 12:04 |
Letzte Änderung: | 15 Mai 2018 12:01 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |