OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

Iqbal, Hasan ; Wang, Yuxia ; Wang, Minghan ; Georgiev, Georgi Nenkov ; Geng, Jiahui ; Gurevych, Iryna ; Nakov, Preslav (2024)
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs.
2024 Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.emnlp-demo.23
Konferenzveröffentlichung, Bibliographie

URL / URN: https://aclanthology.org/2024.emnlp-demo.23/

Kurzbeschreibung (Abstract)

The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures,which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2024
Autor(en):	Iqbal, Hasan ; Wang, Yuxia ; Wang, Minghan ; Georgiev, Georgi Nenkov ; Geng, Jiahui ; Gurevych, Iryna ; Nakov, Preslav
Art des Eintrags:	Bibliographie
Titel:	OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
Sprache:	Englisch
Publikationsjahr:	November 2024
Verlag:	ACL
Buchtitel:	Proceedings of the 2024 Conference on EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Proceedings of System Demonstrations
Veranstaltungstitel:	2024 Conference on Empirical Methods in Natural Language Processing
Veranstaltungsort:	Miami, USA
Veranstaltungsdatum:	12.11.2024 - 16.11.2024
DOI:	10.18653/v1/2024.emnlp-demo.23
URL / URN:	https://aclanthology.org/2024.emnlp-demo.23/
Kurzbeschreibung (Abstract):	The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures,which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum:	28 Nov 2024 08:45
Letzte Änderung:	28 Nov 2024 08:45
PPN:
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung