Cai, Fengyu ; Zhao, Xinran ; Chen, Tong ; Chen, Sihao ; Zhang, Hongming ; Gurevych, Iryna (2024)
MixGR: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity.
29th Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.emnlp-main.579
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domainspecific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces MixGR, which improves dense retrievers’ awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. MixGR fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that MixGR outperforms previous document retrieval by 24.7%, 9.8%, and 6.9% on nDCG@5 with unsupervised, supervised, and LLM-based retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of MixGR to boost the application of LLMs in the scientific domain. The code and experimental datasets are available.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2024 |
Autor(en): | Cai, Fengyu ; Zhao, Xinran ; Chen, Tong ; Chen, Sihao ; Zhang, Hongming ; Gurevych, Iryna |
Art des Eintrags: | Bibliographie |
Titel: | MixGR: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity |
Sprache: | Englisch |
Publikationsjahr: | November 2024 |
Verlag: | ACL |
Buchtitel: | EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference |
Veranstaltungstitel: | 29th Conference on Empirical Methods in Natural Language Processing |
Veranstaltungsort: | Miami, USA |
Veranstaltungsdatum: | 12.11.2024 - 16.11.2024 |
DOI: | 10.18653/v1/2024.emnlp-main.579 |
URL / URN: | https://aclanthology.org/2024.emnlp-main.579/ |
Kurzbeschreibung (Abstract): | Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domainspecific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces MixGR, which improves dense retrievers’ awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. MixGR fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that MixGR outperforms previous document retrieval by 24.7%, 9.8%, and 6.9% on nDCG@5 with unsupervised, supervised, and LLM-based retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of MixGR to boost the application of LLMs in the scientific domain. The code and experimental datasets are available. |
Freie Schlagworte: | UKP_p_seditrah_QABioLit |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 09 Dez 2024 12:59 |
Letzte Änderung: | 09 Dez 2024 12:59 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |