Pfeiffer, Jonas (2023)
Modular and Parameter-efficient Fine-tuning of Language Models.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00024565
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Transfer learning has recently become the dominant paradigm of natural language processing. Models pre-trained on unlabeled data can be fine-tuned for downstream tasks based on only a handful of examples. A long-term goal is to develop models that acquire new information at scale without incurring negative transfer and that generalize systematically to new settings. Modular deep learning has emerged as a promising solution to these challenges, by updating parameter-efficient units of computation locally and asynchronously. These units are often implemented as modules that are interlaid between layers, interpolated with pre-trained parameters, or concatenated to the inputs. Conditioned on tasks or examples, information is routed to multiple modules through a fixed or learned function, followed by an aggregation of their outputs. This property enables compositional generalization, by disentangling knowledge and recombining it in new ways.
In this thesis, we provide a unified view of modularity in natural language processing, spanning across four dimensions; specifically, we disentangle modularity into computation functions, routing functions, aggregation functions, and the training setting. Along those axes, we propose multiple contributions: a research framework which encompasses all dimensions; a novel attention-based aggregation function which combines the knowledge stored within different modules; routing mechanisms for out of distribution generalization in cross-lingual transfer scenarios; a dataset and modular training strategies for multimodal and multilingual transfer learning; a modular pre-training strategy to tackle catastrophic interference of heterogeneous data.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2023 | ||||
Autor(en): | Pfeiffer, Jonas | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Modular and Parameter-efficient Fine-tuning of Language Models | ||||
Sprache: | Englisch | ||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Glavaš, Prof. Dr. Goran ; Vulić, Prof. Dr. Ivan | ||||
Publikationsjahr: | 7 November 2023 | ||||
Ort: | Darmstadt | ||||
Kollation: | xiv, 164 Seiten | ||||
Datum der mündlichen Prüfung: | 21 April 2023 | ||||
DOI: | 10.26083/tuprints-00024565 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/24565 | ||||
Kurzbeschreibung (Abstract): | Transfer learning has recently become the dominant paradigm of natural language processing. Models pre-trained on unlabeled data can be fine-tuned for downstream tasks based on only a handful of examples. A long-term goal is to develop models that acquire new information at scale without incurring negative transfer and that generalize systematically to new settings. Modular deep learning has emerged as a promising solution to these challenges, by updating parameter-efficient units of computation locally and asynchronously. These units are often implemented as modules that are interlaid between layers, interpolated with pre-trained parameters, or concatenated to the inputs. Conditioned on tasks or examples, information is routed to multiple modules through a fixed or learned function, followed by an aggregation of their outputs. This property enables compositional generalization, by disentangling knowledge and recombining it in new ways. In this thesis, we provide a unified view of modularity in natural language processing, spanning across four dimensions; specifically, we disentangle modularity into computation functions, routing functions, aggregation functions, and the training setting. Along those axes, we propose multiple contributions: a research framework which encompasses all dimensions; a novel attention-based aggregation function which combines the knowledge stored within different modules; routing mechanisms for out of distribution generalization in cross-lingual transfer scenarios; a dataset and modular training strategies for multimodal and multilingual transfer learning; a modular pre-training strategy to tackle catastrophic interference of heterogeneous data. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-245651 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
||||
TU-Projekte: | HMWK|LOEWE|emergenC TP Gurevych | ||||
Hinterlegungsdatum: | 07 Nov 2023 15:38 | ||||
Letzte Änderung: | 08 Nov 2023 11:58 | ||||
PPN: | |||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Glavaš, Prof. Dr. Goran ; Vulić, Prof. Dr. Ivan | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 21 April 2023 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |