Iqbal, Hasan ; Wang, Yuxia ; Wang, Minghan ; Georgiev, Georgi Nenkov ; Geng, Jiahui ; Gurevych, Iryna ; Nakov, Preslav (2024)
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs.
2024 Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.emnlp-demo.23
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures,which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2024 |
Autor(en): | Iqbal, Hasan ; Wang, Yuxia ; Wang, Minghan ; Georgiev, Georgi Nenkov ; Geng, Jiahui ; Gurevych, Iryna ; Nakov, Preslav |
Art des Eintrags: | Bibliographie |
Titel: | OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs |
Sprache: | Englisch |
Publikationsjahr: | November 2024 |
Verlag: | ACL |
Buchtitel: | Proceedings of the 2024 Conference on EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Proceedings of System Demonstrations |
Veranstaltungstitel: | 2024 Conference on Empirical Methods in Natural Language Processing |
Veranstaltungsort: | Miami, USA |
Veranstaltungsdatum: | 12.11.2024 - 16.11.2024 |
DOI: | 10.18653/v1/2024.emnlp-demo.23 |
URL / URN: | https://aclanthology.org/2024.emnlp-demo.23/ |
Kurzbeschreibung (Abstract): | The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures,which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI. |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 28 Nov 2024 08:45 |
Letzte Änderung: | 28 Nov 2024 08:45 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |