TU Darmstadt / ULB / TUbiblio

Are Emergent Abilities in Large Language Models just In-Context Learning?

Lu, Sheng ; Bigoulaeva, Irina ; Sachdeva, Rachneet ; Madabushi, Harish Tayyar ; Gurevych, Iryna (2024)
Are Emergent Abilities in Large Language Models just In-Context Learning?
62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand (11.08.2024 - 16.08.2024)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as “emergent abilities,” have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Lu, Sheng ; Bigoulaeva, Irina ; Sachdeva, Rachneet ; Madabushi, Harish Tayyar ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Are Emergent Abilities in Large Language Models just In-Context Learning?
Sprache: Englisch
Publikationsjahr: August 2024
Verlag: ACL
Buchtitel: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Veranstaltungstitel: 62nd Annual Meeting of the Association for Computational Linguistics
Veranstaltungsort: Bangkok, Thailand
Veranstaltungsdatum: 11.08.2024 - 16.08.2024
URL / URN: https://aclanthology.org/2024.acl-long.279/
Kurzbeschreibung (Abstract):

Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as “emergent abilities,” have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated.

Freie Schlagworte: UKP_p_LOEWE_Spitzenprofessur, UKP_p_seditrah_factcheck, UKP_p_PADAM
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 20 Aug 2024 08:54
Letzte Änderung: 20 Aug 2024 08:54
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen