Daheim, Nico ; Möllenhoff, Thomas ; Ponti, Edoardo ; Gurevych, Iryna ; Khan, Mohammad Emtiyaz (2024)
Model Merging by Uncertainty-Based Gradient Matching.
12th International Conference on Learning Representations. Vienna, Austria (07.05.2024 - 10.05.2024)
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2024 |
Autor(en): | Daheim, Nico ; Möllenhoff, Thomas ; Ponti, Edoardo ; Gurevych, Iryna ; Khan, Mohammad Emtiyaz |
Art des Eintrags: | Bibliographie |
Titel: | Model Merging by Uncertainty-Based Gradient Matching |
Sprache: | Englisch |
Publikationsjahr: | Mai 2024 |
Veranstaltungstitel: | 12th International Conference on Learning Representations |
Veranstaltungsort: | Vienna, Austria |
Veranstaltungsdatum: | 07.05.2024 - 10.05.2024 |
URL / URN: | https://openreview.net/forum?id=D7KJmfEDQP |
Kurzbeschreibung (Abstract): | Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters. |
Freie Schlagworte: | UKP_p_seditrah_factcheck |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 01 Jul 2024 11:05 |
Letzte Änderung: | 01 Jul 2024 11:05 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |