Waldis, Andreas ; Perlitz, Yotam ; Choshen, Leshem ; Hou, Yufang ; Gurevych, Iryna (2024)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models.
In: Transactions of the Association for Computational Linguistics, 12
doi: 10.1162/tacl_a_00718
Artikel, Bibliographie
Kurzbeschreibung (Abstract)
We introduce Holmes, a new benchmark designed to assess language models’ (LMs’) linguistic competence—their unconscious understanding of linguistic phenomena. Specifically, we use classifier-based probing to examine LMs’ internal representations regarding distinct linguistic phenomena (e.g., part-of-speech tagging). As a result, we meet recent calls to disentangle LMs’ linguistic competence from other cognitive abilities, such as following instructions in prompting-based evaluations. Composing Holmes, we review over 270 probing studies and include more than 200 datasets to assess syntax, morphology, semantics, reasoning, and discourse phenomena. Analyzing over 50 LMs reveals that, aligned with known trends, their linguistic competence correlates with model size. However, surprisingly, model architecture and instruction tuning also significantly influence performance, particularly in morphology and syntax. Finally, we propose FlashHolmes, a streamlined version that reduces the computation load while maintaining high-ranking precision.
Typ des Eintrags: | Artikel |
---|---|
Erschienen: | 2024 |
Autor(en): | Waldis, Andreas ; Perlitz, Yotam ; Choshen, Leshem ; Hou, Yufang ; Gurevych, Iryna |
Art des Eintrags: | Bibliographie |
Titel: | Holmes: A Benchmark to Assess the Linguistic Competence of Language Models |
Sprache: | Englisch |
Publikationsjahr: | 4 Dezember 2024 |
Verlag: | MIT Press |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | Transactions of the Association for Computational Linguistics |
Jahrgang/Volume einer Zeitschrift: | 12 |
DOI: | 10.1162/tacl_a_00718 |
Kurzbeschreibung (Abstract): | We introduce Holmes, a new benchmark designed to assess language models’ (LMs’) linguistic competence—their unconscious understanding of linguistic phenomena. Specifically, we use classifier-based probing to examine LMs’ internal representations regarding distinct linguistic phenomena (e.g., part-of-speech tagging). As a result, we meet recent calls to disentangle LMs’ linguistic competence from other cognitive abilities, such as following instructions in prompting-based evaluations. Composing Holmes, we review over 270 probing studies and include more than 200 datasets to assess syntax, morphology, semantics, reasoning, and discourse phenomena. Analyzing over 50 LMs reveals that, aligned with known trends, their linguistic competence correlates with model size. However, surprisingly, model architecture and instruction tuning also significantly influence performance, particularly in morphology and syntax. Finally, we propose FlashHolmes, a streamlined version that reduces the computation load while maintaining high-ranking precision. |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 16 Jan 2025 11:59 |
Letzte Änderung: | 16 Jan 2025 11:59 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |