TU Darmstadt / ULB / TUbiblio

Delving Deeper into Cross-lingual Visual Question Answering

Liu, Chen ; Pfeiffer, Jonas ; Korhonen, Anna ; Vulic, Ivan ; Gurevych, Iryna (2023)
Delving Deeper into Cross-lingual Visual Question Answering.
17th Conference of the European Chapter of the Association for Computational Linguistics. Dubrovnik, Croatia (02.05.2023-06.05.2023)
doi: 10.18653/v1/2023.findings-eacl.186
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Visual question answering (VQA) is one of the crucial vision-and-language tasks. Yet, existing VQA research has mostly focused on the English language, due to a lack of suitable evaluation resources. Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers with large gaps to monolingual performance, without any deeper analysis. In this work, we delve deeper into the different aspects of cross-lingual VQA, aiming to understand the impact of 1) modeling methods and choices, including architecture, inductive bias, fine-tuning; 2) learning biases: including question types and modality biases in cross-lingual setups. The key results of our analysis are: 1. We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance, yielding +10 accuracy points over existing methods. 2. We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers, and identify question types that are the most difficult to improve on. 3. We provide an analysis of modality biases present in training data and models, revealing why zero-shot performance gaps remain for certain question types and languages.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Autor(en): Liu, Chen ; Pfeiffer, Jonas ; Korhonen, Anna ; Vulic, Ivan ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Delving Deeper into Cross-lingual Visual Question Answering
Sprache: Englisch
Publikationsjahr: 2 Mai 2023
Verlag: ACL
Buchtitel: The 17th Conference of the European Chapter of the Association for Computational Linguistics - findings of EACL 2023
Veranstaltungstitel: 17th Conference of the European Chapter of the Association for Computational Linguistics
Veranstaltungsort: Dubrovnik, Croatia
Veranstaltungsdatum: 02.05.2023-06.05.2023
DOI: 10.18653/v1/2023.findings-eacl.186
URL / URN: https://aclanthology.org/2023.findings-eacl.186/
Kurzbeschreibung (Abstract):

Visual question answering (VQA) is one of the crucial vision-and-language tasks. Yet, existing VQA research has mostly focused on the English language, due to a lack of suitable evaluation resources. Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers with large gaps to monolingual performance, without any deeper analysis. In this work, we delve deeper into the different aspects of cross-lingual VQA, aiming to understand the impact of 1) modeling methods and choices, including architecture, inductive bias, fine-tuning; 2) learning biases: including question types and modality biases in cross-lingual setups. The key results of our analysis are: 1. We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance, yielding +10 accuracy points over existing methods. 2. We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers, and identify question types that are the most difficult to improve on. 3. We provide an analysis of modality biases present in training data and models, revealing why zero-shot performance gaps remain for certain question types and languages.

Freie Schlagworte: UKP_p_MISRIK, UKP_p_emergencity, emergenCITY, emergenCITY_INF
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
LOEWE
LOEWE > LOEWE-Zentren
LOEWE > LOEWE-Zentren > emergenCITY
TU-Projekte: HMWK|LOEWE|emergenC TP Gurevych
Hinterlegungsdatum: 12 Jun 2023 12:34
Letzte Änderung: 19 Jan 2024 18:30
PPN: 510470572
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen