Variational Learning is Effective for Large Deep Networks

Shen, Yuesong ; Daheim, Nico ; Cong, Bai ; Nickl, Peter ; Marconi, Gian Maria ; Bazan, Clement ; Yokota, Rio ; Gurevych, Iryna ; Cremers, Daniel ; Khan, Mohammad Emtiyaz ; Möllenhoff, Thomas (2024)
Variational Learning is Effective for Large Deep Networks.
41th International Conference on Machine Learning. Vienna, Austria (21.07.2024 - 27.07.2024)
Konferenzveröffentlichung, Bibliographie

URL / URN: https://proceedings.mlr.press/v235/shen24b.html

Kurzbeschreibung (Abstract)

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2024
Autor(en):	Shen, Yuesong ; Daheim, Nico ; Cong, Bai ; Nickl, Peter ; Marconi, Gian Maria ; Bazan, Clement ; Yokota, Rio ; Gurevych, Iryna ; Cremers, Daniel ; Khan, Mohammad Emtiyaz ; Möllenhoff, Thomas
Art des Eintrags:	Bibliographie
Titel:	Variational Learning is Effective for Large Deep Networks
Sprache:	Englisch
Publikationsjahr:	28 Juli 2024
Verlag:	MLResearch Press
Buchtitel:	Proceedings of the 41st International Conference on Machine Learning
Reihe:	Proceedings of Machine Learning Research
Band einer Reihe:	235
Veranstaltungstitel:	41th International Conference on Machine Learning
Veranstaltungsort:	Vienna, Austria
Veranstaltungsdatum:	21.07.2024 - 27.07.2024
URL / URN:	https://proceedings.mlr.press/v235/shen24b.html
Kurzbeschreibung (Abstract):	We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.
Freie Schlagworte:	UKP_p_seditrah_factcheck
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum:	27 Aug 2024 13:13
Letzte Änderung:	26 Nov 2024 15:06
PPN:	524138273
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung