TU Darmstadt / ULB / TUbiblio

GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Cai, Fengyu ; Zhao, Xinran ; Zhang, Hongming ; Gurevych, Iryna ; Koeppl, Heinz (2024)
GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics.
62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand (12.08.2024 -16.08.2024)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of class-wise hardness. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose GeoHard for class-wise hardness measurement by modeling class geometry in the semantic embedding space. GeoHard surpasses instance-level metrics by over 59 percent on Pearson‘s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of GeoHard as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Cai, Fengyu ; Zhao, Xinran ; Zhang, Hongming ; Gurevych, Iryna ; Koeppl, Heinz
Art des Eintrags: Bibliographie
Titel: GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics
Sprache: Englisch
Publikationsjahr: August 2024
Verlag: ACL
Buchtitel: Findings of the Association for Computational Linguistics ACL 2024
Veranstaltungstitel: 62nd Annual Meeting of the Association for Computational Linguistics
Veranstaltungsort: Bangkok, Thailand
Veranstaltungsdatum: 12.08.2024 -16.08.2024
URL / URN: https://aclanthology.org/2024.findings-acl.332/
Kurzbeschreibung (Abstract):

Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of class-wise hardness. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose GeoHard for class-wise hardness measurement by modeling class geometry in the semantic embedding space. GeoHard surpasses instance-level metrics by over 59 percent on Pearson‘s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of GeoHard as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.

Freie Schlagworte: UKP_p_seditrah_QABioLit
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 27 Aug 2024 13:21
Letzte Änderung: 27 Aug 2024 13:21
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen