TU Darmstadt / ULB / TUbiblio

Transformers with Learnable Activation Functions

Fang, Haishuo ; Lee, Ji-Ung ; Moosavi, Nafise Sadat ; Gurevych, Iryna (2023)
Transformers with Learnable Activation Functions.
17th Conference of the European Chapter of the Association for Computational Linguistics. Dubrovnik, Croatia (02.05.2023-06.05.2023)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Activation functions can have a significant impact on reducing the topological complexity of input data and therefore, improving a model’s performance. However, the choice of activation functions is seldom discussed or explored in Transformer-based language models. As a common practice, commonly used activation functions like Gaussian Error Linear Unit (GELU) are chosen beforehand and then remain fixed from pre-training to fine-tuning. In this paper, we investigate the impact of activation functions on Transformer-based models by utilizing rational activation functions (RAFs). In contrast to fixed activation functions (FAF), RAFs are capable of learning the optimal activation functions from data. Our experiments show that the RAF-based Transformer model (RAFT) achieves a better performance than its FAF-based counterpart (). For instance, we find that RAFT outperforms on the GLUE benchmark by 5.71 points when using only 100 training examples and by 2.05 points on SQuAD with all available data. Analyzing the shapes of the learned RAFs further unveils that they vary across different layers and different tasks; opening a promising way to better analyze and understand large, pre-trained language models.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Autor(en): Fang, Haishuo ; Lee, Ji-Ung ; Moosavi, Nafise Sadat ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Transformers with Learnable Activation Functions
Sprache: Englisch
Publikationsjahr: 2 Mai 2023
Verlag: ACL
Buchtitel: The 17th Conference of the European Chapter of the Association for Computational Linguistics - findings of EACL 2023
Veranstaltungstitel: 17th Conference of the European Chapter of the Association for Computational Linguistics
Veranstaltungsort: Dubrovnik, Croatia
Veranstaltungsdatum: 02.05.2023-06.05.2023
URL / URN: https://aclanthology.org/2023.findings-eacl.181/
Kurzbeschreibung (Abstract):

Activation functions can have a significant impact on reducing the topological complexity of input data and therefore, improving a model’s performance. However, the choice of activation functions is seldom discussed or explored in Transformer-based language models. As a common practice, commonly used activation functions like Gaussian Error Linear Unit (GELU) are chosen beforehand and then remain fixed from pre-training to fine-tuning. In this paper, we investigate the impact of activation functions on Transformer-based models by utilizing rational activation functions (RAFs). In contrast to fixed activation functions (FAF), RAFs are capable of learning the optimal activation functions from data. Our experiments show that the RAF-based Transformer model (RAFT) achieves a better performance than its FAF-based counterpart (). For instance, we find that RAFT outperforms on the GLUE benchmark by 5.71 points when using only 100 training examples and by 2.05 points on SQuAD with all available data. Analyzing the shapes of the learned RAFs further unveils that they vary across different layers and different tasks; opening a promising way to better analyze and understand large, pre-trained language models.

Freie Schlagworte: UKP_p_crisp_senpai,UKP_p_seditrah_factcheck,UKP_p_square
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 12 Jun 2023 12:33
Letzte Änderung: 09 Aug 2023 12:49
PPN: 510470017
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen