TU Darmstadt / ULB / TUbiblio

Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers

Wang, Yuxia ; Reddy, Revanth Gangi ; Mujahid, Zain Muhammad ; Arora, Arnav ; Rubashevskii, Aleksandr ; Geng, Jiahui ; Afzal, Osama Mohammed ; Pan, Liangming ; Borenstein, Nadav ; Pillai, Aditya ; Augenstein, Isabelle ; Gurevych, Iryna ; Nakov, Preslav (2024)
Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers.
29th Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.findings-emnlp.830
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present Factcheck-Bench, a holistic end-to-end framework for annotating and evaluating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels for fact-checking and correcting not just the final prediction, but also the intermediate steps that a fact-checking system might need to take. Based on this framework, we construct an open-domain factuality benchmark in three-levels of granularity: claim, sentence, and document. We further propose a system, Factcheck-GPT, which follows our framework, and we show that it outperforms several popular LLM fact-checkers. We make our annotation tool, annotated data, benchmark, and code available at https://github.com/yuxiaw/Factcheck-GPT.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Wang, Yuxia ; Reddy, Revanth Gangi ; Mujahid, Zain Muhammad ; Arora, Arnav ; Rubashevskii, Aleksandr ; Geng, Jiahui ; Afzal, Osama Mohammed ; Pan, Liangming ; Borenstein, Nadav ; Pillai, Aditya ; Augenstein, Isabelle ; Gurevych, Iryna ; Nakov, Preslav
Art des Eintrags: Bibliographie
Titel: Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
Sprache: Englisch
Publikationsjahr: November 2024
Verlag: ACL
Buchtitel: EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Findings of EMNLP 2024
Veranstaltungstitel: 29th Conference on Empirical Methods in Natural Language Processing
Veranstaltungsort: Miami, USA
Veranstaltungsdatum: 12.11.2024 - 16.11.2024
DOI: 10.18653/v1/2024.findings-emnlp.830
URL / URN: https://aclanthology.org/2024.findings-emnlp.830/
Kurzbeschreibung (Abstract):

The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present Factcheck-Bench, a holistic end-to-end framework for annotating and evaluating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels for fact-checking and correcting not just the final prediction, but also the intermediate steps that a fact-checking system might need to take. Based on this framework, we construct an open-domain factuality benchmark in three-levels of granularity: claim, sentence, and document. We further propose a system, Factcheck-GPT, which follows our framework, and we show that it outperforms several popular LLM fact-checkers. We make our annotation tool, annotated data, benchmark, and code available at https://github.com/yuxiaw/Factcheck-GPT.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 17 Dez 2024 11:43
Letzte Änderung: 17 Dez 2024 11:43
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen