TU Darmstadt / ULB / TUbiblio

MixGR: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Cai, Fengyu ; Zhao, Xinran ; Chen, Tong ; Chen, Sihao ; Zhang, Hongming ; Gurevych, Iryna (2024)
MixGR: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity.
29th Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.emnlp-main.579
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domainspecific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces MixGR, which improves dense retrievers’ awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. MixGR fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that MixGR outperforms previous document retrieval by 24.7%, 9.8%, and 6.9% on nDCG@5 with unsupervised, supervised, and LLM-based retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of MixGR to boost the application of LLMs in the scientific domain. The code and experimental datasets are available.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Cai, Fengyu ; Zhao, Xinran ; Chen, Tong ; Chen, Sihao ; Zhang, Hongming ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: MixGR: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity
Sprache: Englisch
Publikationsjahr: November 2024
Verlag: ACL
Buchtitel: EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference
Veranstaltungstitel: 29th Conference on Empirical Methods in Natural Language Processing
Veranstaltungsort: Miami, USA
Veranstaltungsdatum: 12.11.2024 - 16.11.2024
DOI: 10.18653/v1/2024.emnlp-main.579
URL / URN: https://aclanthology.org/2024.emnlp-main.579/
Kurzbeschreibung (Abstract):

Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domainspecific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces MixGR, which improves dense retrievers’ awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. MixGR fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that MixGR outperforms previous document retrieval by 24.7%, 9.8%, and 6.9% on nDCG@5 with unsupervised, supervised, and LLM-based retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of MixGR to boost the application of LLMs in the scientific domain. The code and experimental datasets are available.

Freie Schlagworte: UKP_p_seditrah_QABioLit
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 09 Dez 2024 12:59
Letzte Änderung: 09 Dez 2024 12:59
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen