TU Darmstadt / ULB / TUbiblio

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

Waldis, Andreas ; Perlitz, Yotam ; Choshen, Leshem ; Hou, Yufang ; Gurevych, Iryna (2024)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models.
In: Transactions of the Association for Computational Linguistics, 12
doi: 10.1162/tacl_a_00718
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

We introduce Holmes, a new benchmark designed to assess language models’ (LMs’) linguistic competence—their unconscious understanding of linguistic phenomena. Specifically, we use classifier-based probing to examine LMs’ internal representations regarding distinct linguistic phenomena (e.g., part-of-speech tagging). As a result, we meet recent calls to disentangle LMs’ linguistic competence from other cognitive abilities, such as following instructions in prompting-based evaluations. Composing Holmes, we review over 270 probing studies and include more than 200 datasets to assess syntax, morphology, semantics, reasoning, and discourse phenomena. Analyzing over 50 LMs reveals that, aligned with known trends, their linguistic competence correlates with model size. However, surprisingly, model architecture and instruction tuning also significantly influence performance, particularly in morphology and syntax. Finally, we propose FlashHolmes, a streamlined version that reduces the computation load while maintaining high-ranking precision.

Typ des Eintrags: Artikel
Erschienen: 2024
Autor(en): Waldis, Andreas ; Perlitz, Yotam ; Choshen, Leshem ; Hou, Yufang ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Holmes: A Benchmark to Assess the Linguistic Competence of Language Models
Sprache: Englisch
Publikationsjahr: 4 Dezember 2024
Verlag: MIT Press
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Transactions of the Association for Computational Linguistics
Jahrgang/Volume einer Zeitschrift: 12
DOI: 10.1162/tacl_a_00718
Kurzbeschreibung (Abstract):

We introduce Holmes, a new benchmark designed to assess language models’ (LMs’) linguistic competence—their unconscious understanding of linguistic phenomena. Specifically, we use classifier-based probing to examine LMs’ internal representations regarding distinct linguistic phenomena (e.g., part-of-speech tagging). As a result, we meet recent calls to disentangle LMs’ linguistic competence from other cognitive abilities, such as following instructions in prompting-based evaluations. Composing Holmes, we review over 270 probing studies and include more than 200 datasets to assess syntax, morphology, semantics, reasoning, and discourse phenomena. Analyzing over 50 LMs reveals that, aligned with known trends, their linguistic competence correlates with model size. However, surprisingly, model architecture and instruction tuning also significantly influence performance, particularly in morphology and syntax. Finally, we propose FlashHolmes, a streamlined version that reduces the computation load while maintaining high-ranking precision.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 16 Jan 2025 11:59
Letzte Änderung: 16 Jan 2025 11:59
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen