TU Darmstadt / ULB / TUbiblio

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

Wang, Kexin ; Thakur, Nandan ; Reimers, Nils ; Gurevych, Iryna (2022)
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval.
2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle, USA (10.-15.07.2022)
Conference or Workshop Item, Bibliographie

Abstract

Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 9.3 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.4 points nDCG@10 across the six tasks.

Item Type: Conference or Workshop Item
Erschienen: 2022
Creators: Wang, Kexin ; Thakur, Nandan ; Reimers, Nils ; Gurevych, Iryna
Type of entry: Bibliographie
Title: GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
Language: English
Date: 11 July 2022
Publisher: Association for Computational Linguistics
Book Title: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Event Title: 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Event Location: Seattle, USA
Event Dates: 10.-15.07.2022
URL / URN: https://aclanthology.org/2022.naacl-main.168
Abstract:

Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 9.3 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.4 points nDCG@10 across the six tasks.

Uncontrolled Keywords: UKP_p_square
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Date Deposited: 18 Jul 2022 08:29
Last Modified: 18 Nov 2022 08:16
PPN: 501768351
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details