TU Darmstadt / ULB / TUbiblio

Self-Imitation Regularization: Regularizing Neural Networks by Leveraging Their Dark Knowledge

Jäger, Jonas (2019)
Self-Imitation Regularization: Regularizing Neural Networks by Leveraging Their Dark Knowledge.
Technische Universität Darmstadt
Bachelorarbeit, Erstveröffentlichung

Kurzbeschreibung (Abstract)

Deep Learning, the learning of deep neural networks, is nowadays indispensable not only in the fields of computer science and information technology but also in innumerable areas of daily life. It is one of the key technologies in the development of artificial intelligence and will continue to be of great importance in the future, e.g., in the development of autonomous driving. Since the data for learning such (deep) neural networks is clearly limited and therefore the neural network cannot be prepared for all possible data which have to be handled in real-life situations, a solid generalization capability is necessary. This means the ability to acquire a general concept from the training data, so that the task associated with the data is properly understood and the training data is not simply memorized. An essential component behind such a generalization capability is regularization.

A regularization procedure causes a neural network to generalize (better) when learning from data. Various, commonly and widespread used regularization procedures exist, which, for example, limit the weights in the neural network (parameters by whose adaptation learning takes place) or temporarily change its structure, and thus implicitly aim for the neural network to make better predictions on as yet unseen data.

Self-Imitation Regularization (SIR) is presented in this thesis and is an easy to implement regularization procedure which, in contrast to the established standard regularization procedures, explicitly addresses the actual objective - the formation of predictions - and only implicitly influences the weights/parameters in the neural network. The existing (dark) knowledge of the learning neural network is used and explicitly involved in the learning principles (i.e., the error function to be minimized) of the neural network. Since this is one's own knowledge, which, in turn, is made available during learning, this can be seen as a form of self-imitation. For a given data example, the (dark) knowledge contains, on the one hand, information about the similarities to other classes (in a classification problem, a neural network predicts classes and classifies the data). On the other hand, it quantifies (relative to other data examples) the confidence of the neural network in the prediction for this data example. Intuitively, through self-imitation, this information induces a questioning behavior regarding the correctness of the given solutions in the training data as well as deepens the understanding of correlations and similarities between the classes.

Besides the regularization ability, which has strong guarantees of success (partially under statistical significance), the use of SIR stabilizes the training, increases the data efficiency and is resistant to erroneous data labels, which was demonstrated in various experiments. It is also applicable to very deep neural network architectures and can be combined with some standard regularization methods (i.e., dropout and maxnorm regularization). The implementation and additional computation effort is very low while the hyperparameter tuning is simple.

In this thesis, experimental results on the use of SIR are analyzed and evaluated using several procedures, and the learned neural networks are examined closely in order to explain the regularization behavior along with accompanying properties.

Typ des Eintrags: Bachelorarbeit
Erschienen: 2019
Autor(en): Jäger, Jonas
Art des Eintrags: Erstveröffentlichung
Titel: Self-Imitation Regularization: Regularizing Neural Networks by Leveraging Their Dark Knowledge
Sprache: Englisch
Referenten: Fürnkranz, Prof. Dr. Johannes ; Loza Mencía, Dr. Eneldo
Publikationsjahr: 4 Juni 2019
Ort: Darmstadt
Datum der mündlichen Prüfung: 23 Mai 2019
URL / URN: https://tuprints.ulb.tu-darmstadt.de/8717
Kurzbeschreibung (Abstract):

Deep Learning, the learning of deep neural networks, is nowadays indispensable not only in the fields of computer science and information technology but also in innumerable areas of daily life. It is one of the key technologies in the development of artificial intelligence and will continue to be of great importance in the future, e.g., in the development of autonomous driving. Since the data for learning such (deep) neural networks is clearly limited and therefore the neural network cannot be prepared for all possible data which have to be handled in real-life situations, a solid generalization capability is necessary. This means the ability to acquire a general concept from the training data, so that the task associated with the data is properly understood and the training data is not simply memorized. An essential component behind such a generalization capability is regularization.

A regularization procedure causes a neural network to generalize (better) when learning from data. Various, commonly and widespread used regularization procedures exist, which, for example, limit the weights in the neural network (parameters by whose adaptation learning takes place) or temporarily change its structure, and thus implicitly aim for the neural network to make better predictions on as yet unseen data.

Self-Imitation Regularization (SIR) is presented in this thesis and is an easy to implement regularization procedure which, in contrast to the established standard regularization procedures, explicitly addresses the actual objective - the formation of predictions - and only implicitly influences the weights/parameters in the neural network. The existing (dark) knowledge of the learning neural network is used and explicitly involved in the learning principles (i.e., the error function to be minimized) of the neural network. Since this is one's own knowledge, which, in turn, is made available during learning, this can be seen as a form of self-imitation. For a given data example, the (dark) knowledge contains, on the one hand, information about the similarities to other classes (in a classification problem, a neural network predicts classes and classifies the data). On the other hand, it quantifies (relative to other data examples) the confidence of the neural network in the prediction for this data example. Intuitively, through self-imitation, this information induces a questioning behavior regarding the correctness of the given solutions in the training data as well as deepens the understanding of correlations and similarities between the classes.

Besides the regularization ability, which has strong guarantees of success (partially under statistical significance), the use of SIR stabilizes the training, increases the data efficiency and is resistant to erroneous data labels, which was demonstrated in various experiments. It is also applicable to very deep neural network architectures and can be combined with some standard regularization methods (i.e., dropout and maxnorm regularization). The implementation and additional computation effort is very low while the hyperparameter tuning is simple.

In this thesis, experimental results on the use of SIR are analyzed and evaluated using several procedures, and the learned neural networks are examined closely in order to explain the regularization behavior along with accompanying properties.

URN: urn:nbn:de:tuda-tuprints-87175
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Knowledge Engineering
Hinterlegungsdatum: 16 Jun 2019 19:55
Letzte Änderung: 16 Jun 2019 19:55
PPN:
Referenten: Fürnkranz, Prof. Dr. Johannes ; Loza Mencía, Dr. Eneldo
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: 23 Mai 2019
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen