TU Darmstadt / ULB / TUbiblio

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Thakur, Nandan ; Wang, Kexin ; Gurevych, Iryna ; Lin, Jimmy (2023)
SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval.
46th International ACM SIGIR Conference on Research and Development in Information Retrieval. Taipei, Taiwan (23.07.2023-27.07.2023)
doi: 10.1145/3539618.3591902
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here: https://github.com/thakur-nandan/sprint.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Autor(en): Thakur, Nandan ; Wang, Kexin ; Gurevych, Iryna ; Lin, Jimmy
Art des Eintrags: Bibliographie
Titel: SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval
Sprache: Englisch
Publikationsjahr: 18 Juli 2023
Verlag: ACM
Buchtitel: SIGIR'23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
Veranstaltungstitel: 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
Veranstaltungsort: Taipei, Taiwan
Veranstaltungsdatum: 23.07.2023-27.07.2023
DOI: 10.1145/3539618.3591902
Kurzbeschreibung (Abstract):

Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here: https://github.com/thakur-nandan/sprint.

Freie Schlagworte: UKP_p_square
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 07 Aug 2023 10:43
Letzte Änderung: 07 Aug 2023 14:41
PPN: 510423965
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen