TU Darmstadt / ULB / TUbiblio

Model Merging by Uncertainty-Based Gradient Matching

Daheim, Nico ; Möllenhoff, Thomas ; Ponti, Edoardo ; Gurevych, Iryna ; Khan, Mohammad Emtiyaz (2024)
Model Merging by Uncertainty-Based Gradient Matching.
12th International Conference on Learning Representations. Vienna, Austria (07.05.2024 - 10.05.2024)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Daheim, Nico ; Möllenhoff, Thomas ; Ponti, Edoardo ; Gurevych, Iryna ; Khan, Mohammad Emtiyaz
Art des Eintrags: Bibliographie
Titel: Model Merging by Uncertainty-Based Gradient Matching
Sprache: Englisch
Publikationsjahr: Mai 2024
Veranstaltungstitel: 12th International Conference on Learning Representations
Veranstaltungsort: Vienna, Austria
Veranstaltungsdatum: 07.05.2024 - 10.05.2024
URL / URN: https://openreview.net/forum?id=D7KJmfEDQP
Kurzbeschreibung (Abstract):

Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.

Freie Schlagworte: UKP_p_seditrah_factcheck
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 01 Jul 2024 11:05
Letzte Änderung: 01 Jul 2024 11:05
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen