TU Darmstadt / ULB / TUbiblio

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

Iqbal, Hasan ; Wang, Yuxia ; Wang, Minghan ; Georgiev, Georgi Nenkov ; Geng, Jiahui ; Gurevych, Iryna ; Nakov, Preslav (2024)
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs.
2024 Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.emnlp-demo.23
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures,which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Iqbal, Hasan ; Wang, Yuxia ; Wang, Minghan ; Georgiev, Georgi Nenkov ; Geng, Jiahui ; Gurevych, Iryna ; Nakov, Preslav
Art des Eintrags: Bibliographie
Titel: OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
Sprache: Englisch
Publikationsjahr: November 2024
Verlag: ACL
Buchtitel: Proceedings of the 2024 Conference on EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Proceedings of System Demonstrations
Veranstaltungstitel: 2024 Conference on Empirical Methods in Natural Language Processing
Veranstaltungsort: Miami, USA
Veranstaltungsdatum: 12.11.2024 - 16.11.2024
DOI: 10.18653/v1/2024.emnlp-demo.23
URL / URN: https://aclanthology.org/2024.emnlp-demo.23/
Kurzbeschreibung (Abstract):

The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures,which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 28 Nov 2024 08:45
Letzte Änderung: 28 Nov 2024 08:45
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen