TU Darmstadt / ULB / TUbiblio

Semi-Supervised Online Speaker Diarization using Vector Quantization with Alternative Codebooks

El-Hindi, Mahmoud ; Muma, Michael ; Zoubir, Abdelhak M. (2022)
Semi-Supervised Online Speaker Diarization using Vector Quantization with Alternative Codebooks.
30th European Signal Processing Conference. Belgrade, Serbia (29.08.2022-02.09.2022)
doi: 10.23919/EUSIPCO55093.2022.9909891
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Speaker diarization systems process audio files by labelling speech segments according to speakers' identities. Many speaker diarization systems work offline and are not suited for online applications. We present a semi-supervised, online, low-complexity system. While, in general, speaker diarization operates in an unsupervised manner, the presented system relies on the enrollment of the participating speakers in the conversation. The diarization system has two main novel aspects. The first one is a proposed online learning strategy that evaluates processed segments according to their usefulness for learning a speaker, i.e. update a speaker model with it. The segment is evaluated using two metrics to determine whether to use the segment to update the system. The second novel aspect is a proposed vector quantization approach that models the score not only depending on the target speaker codebook but also takes an alternative codebook into account. We also present an approach to compute the alternative codebook. Simulation results show that the proposed system outperforms a comparable system without the proposed online learning strategy and shows benefits, especially for short training lengths.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2022
Autor(en): El-Hindi, Mahmoud ; Muma, Michael ; Zoubir, Abdelhak M.
Art des Eintrags: Bibliographie
Titel: Semi-Supervised Online Speaker Diarization using Vector Quantization with Alternative Codebooks
Sprache: Englisch
Publikationsjahr: 18 Oktober 2022
Verlag: IEEE
Buchtitel: 30th European Signal Processing Conference (EUSIPCO 2022): Proceedings
Veranstaltungstitel: 30th European Signal Processing Conference
Veranstaltungsort: Belgrade, Serbia
Veranstaltungsdatum: 29.08.2022-02.09.2022
DOI: 10.23919/EUSIPCO55093.2022.9909891
URL / URN: https://ieeexplore.ieee.org/document/9909891
Kurzbeschreibung (Abstract):

Speaker diarization systems process audio files by labelling speech segments according to speakers' identities. Many speaker diarization systems work offline and are not suited for online applications. We present a semi-supervised, online, low-complexity system. While, in general, speaker diarization operates in an unsupervised manner, the presented system relies on the enrollment of the participating speakers in the conversation. The diarization system has two main novel aspects. The first one is a proposed online learning strategy that evaluates processed segments according to their usefulness for learning a speaker, i.e. update a speaker model with it. The segment is evaluated using two metrics to determine whether to use the segment to update the system. The second novel aspect is a proposed vector quantization approach that models the score not only depending on the target speaker codebook but also takes an alternative codebook into account. We also present an approach to compute the alternative codebook. Simulation results show that the proposed system outperforms a comparable system without the proposed online learning strategy and shows benefits, especially for short training lengths.

Freie Schlagworte: emergenCITY, emergenCITY_CPS
Fachbereich(e)/-gebiet(e): 18 Fachbereich Elektrotechnik und Informationstechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Robust Data Science
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Signalverarbeitung
Hinterlegungsdatum: 09 Dez 2022 09:13
Letzte Änderung: 06 Jun 2023 15:58
PPN: 508349737
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen