El-Hindi, Mahmoud ; Muma, Michael ; Zoubir, Abdelhak M. (2022)
Semi-Supervised Online Speaker Diarization using Vector Quantization with Alternative Codebooks.
30th European Signal Processing Conference. Belgrade, Serbia (29.08.2022-02.09.2022)
doi: 10.23919/EUSIPCO55093.2022.9909891
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Speaker diarization systems process audio files by labelling speech segments according to speakers' identities. Many speaker diarization systems work offline and are not suited for online applications. We present a semi-supervised, online, low-complexity system. While, in general, speaker diarization operates in an unsupervised manner, the presented system relies on the enrollment of the participating speakers in the conversation. The diarization system has two main novel aspects. The first one is a proposed online learning strategy that evaluates processed segments according to their usefulness for learning a speaker, i.e. update a speaker model with it. The segment is evaluated using two metrics to determine whether to use the segment to update the system. The second novel aspect is a proposed vector quantization approach that models the score not only depending on the target speaker codebook but also takes an alternative codebook into account. We also present an approach to compute the alternative codebook. Simulation results show that the proposed system outperforms a comparable system without the proposed online learning strategy and shows benefits, especially for short training lengths.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2022 |
Autor(en): | El-Hindi, Mahmoud ; Muma, Michael ; Zoubir, Abdelhak M. |
Art des Eintrags: | Bibliographie |
Titel: | Semi-Supervised Online Speaker Diarization using Vector Quantization with Alternative Codebooks |
Sprache: | Englisch |
Publikationsjahr: | 18 Oktober 2022 |
Verlag: | IEEE |
Buchtitel: | 30th European Signal Processing Conference (EUSIPCO 2022): Proceedings |
Veranstaltungstitel: | 30th European Signal Processing Conference |
Veranstaltungsort: | Belgrade, Serbia |
Veranstaltungsdatum: | 29.08.2022-02.09.2022 |
DOI: | 10.23919/EUSIPCO55093.2022.9909891 |
URL / URN: | https://ieeexplore.ieee.org/document/9909891 |
Kurzbeschreibung (Abstract): | Speaker diarization systems process audio files by labelling speech segments according to speakers' identities. Many speaker diarization systems work offline and are not suited for online applications. We present a semi-supervised, online, low-complexity system. While, in general, speaker diarization operates in an unsupervised manner, the presented system relies on the enrollment of the participating speakers in the conversation. The diarization system has two main novel aspects. The first one is a proposed online learning strategy that evaluates processed segments according to their usefulness for learning a speaker, i.e. update a speaker model with it. The segment is evaluated using two metrics to determine whether to use the segment to update the system. The second novel aspect is a proposed vector quantization approach that models the score not only depending on the target speaker codebook but also takes an alternative codebook into account. We also present an approach to compute the alternative codebook. Simulation results show that the proposed system outperforms a comparable system without the proposed online learning strategy and shows benefits, especially for short training lengths. |
Freie Schlagworte: | emergenCITY, emergenCITY_CPS |
Fachbereich(e)/-gebiet(e): | 18 Fachbereich Elektrotechnik und Informationstechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Robust Data Science 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Signalverarbeitung |
Hinterlegungsdatum: | 09 Dez 2022 09:13 |
Letzte Änderung: | 06 Jun 2023 15:58 |
PPN: | 508349737 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |