TU Darmstadt / ULB / TUbiblio

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Thakur, Nandan ; Wang, Kexin ; Gurevych, Iryna ; Lin, Jimmy (2023)
SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval.
46th International ACM SIGIR Conference on Research and Development in Information Retrieval. Taipei, Taiwan (23.-27.07.2023)
doi: 10.1145/3539618.3591902
Conference or Workshop Item, Bibliographie

Abstract

Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here: https://github.com/thakur-nandan/sprint.

Item Type: Conference or Workshop Item
Erschienen: 2023
Creators: Thakur, Nandan ; Wang, Kexin ; Gurevych, Iryna ; Lin, Jimmy
Type of entry: Bibliographie
Title: SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval
Language: English
Date: 18 July 2023
Publisher: ACM
Book Title: SIGIR'23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
Event Title: 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
Event Location: Taipei, Taiwan
Event Dates: 23.-27.07.2023
DOI: 10.1145/3539618.3591902
Abstract:

Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here: https://github.com/thakur-nandan/sprint.

Uncontrolled Keywords: UKP_p_square
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Date Deposited: 07 Aug 2023 10:43
Last Modified: 07 Aug 2023 14:41
PPN: 510423965
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details