Kauschke, Sebastian (2019)
Patching - A Framework for Adapting Immutable Classifiers to Evolving Domains.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
Machine learning models are subject to changing circumstances, and will degrade over time. Nowadays, data are collected in vast amounts: Personal data is retrieved by our phones, by our internet browser, via our shopping behavior, and especially through all the content that we upload to social media platforms. Machines in factories, cars, essentially every device that is not purely mechanical anymore, may be also collecting data. This data is often used to build predictive models, e.g., for recommender systems or remaining lifetime estimation. As all things in life, the data and the knowledge extracted from a person or machine is subject to change, which is called concept drift. This concept drift may be caused by varying circumstances, changes in the expected outcome, or completely new requirements for the task. In any case, to keep a model operative, adaptive learning mechanisms are required to deal with the drift. Related works in this area cover a plethora of adaptive learning mechanisms. Usually, these algorithms are made to learn on streams of data from scratch. However, we argue that in many real-world scenarios this type of learning does not fit the actual application. It is rather, that stationary models are trained in a sandbox environment on large datasets, which are then put into practical use. If these models are not specifically constructed to be adaptive, any concept drift will lower the performance. Since training such a model, e.g., a deep neural network, can be expensive in regards of cost and time required, it is desirable to use it as long as possible. We introduce a new paradigm of adapting existing models. Our goal is to keep the existing models as long as possible, and only adapt it to the concept drift where it is necessary. We solve this by computing partial adaptations, so called patches. Via this mechanism, we can assure the existing model to live longer, and keep the learning required for adaptation to a minimum. The Patching mechanism elongates the lifetime of a machine learned model, helps to adapt with fewer observed instances, aids in individualizing an existing model, and generally increases the models’ cost efficiency. In this dissertation we first introduce a general framework for learn- ing patches as adaptation mechanisms. We evaluate the concept, and compare it against state of the art stream learning mechanisms. When dealing with normal stream scenarios, it is reasonable to apply Patch- ing. However, when dealing with scenarios which it is intended for, Patching excels in adaptation speed and overall performance. In a second contribution we specialize the patching idea on neural networks. Since neural networks are expensive and time consuming in training, we require a way of adapting them quickly. Although neural networks can be adapted via the normal training process, training them with newer data can lead to side effects such as catastrophic forgetting. Depending on the size and complexity of the network, adapting them can also be either expensive or—when given only few examples—unsuccessful. We propose neural network patching (NN- Patching) as a solution to this issue. In NN-Patching, the underlying network remains unchanged. However, a neural patch is trained by using the inner activations of the base network. These represent latent features that can be useful towards the given task. An error estimator network determines, whether the patch network or the base network is better suited to classify an instance. NN-Patching shows even more significant improvements than Patching, with quick adaptation and overall adaptive capabilities that rival those of the theoretically more capable competition. The final contribution is geared towards the use in scenarios that require model individualization or deal with re-occuring concepts. For this task we propose Ensemble Patching, a variant of Patching that builds an ensemble of patches. These patches are learned in such a way, that they each cover a distinctive type of concept drift. When a new concept emerges, a certain error pattern will occur for the base classifier. A specific patch is then learned. All ensemble members are managed via a recurrent network called the ensemble conductor. This separately trained model will conduct the ensemble decision, and is the key player for the adaptation. When concepts become outdated, the conductor will put less weight on the decisions of the respective patches, but by its structure it can quickly reactivate them, should older concepts become relevant again. Our evaluation demonstrates that this ensemble technique handles recurring concepts very well. Ensemble Patching can also be employed in a stream classification scenario, where computational efficiency is important.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2019 | ||||
Autor(en): | Kauschke, Sebastian | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Patching - A Framework for Adapting Immutable Classifiers to Evolving Domains | ||||
Sprache: | Englisch | ||||
Referenten: | Fürnkranz, Prof. Dr. Johannes ; Mühlhäuser, Prof. Dr. Max ; Hammer, Prof. Dr. Barbara | ||||
Publikationsjahr: | 16 September 2019 | ||||
Ort: | Darmstadt | ||||
Datum der mündlichen Prüfung: | 6 September 2019 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/9089 | ||||
Kurzbeschreibung (Abstract): | Machine learning models are subject to changing circumstances, and will degrade over time. Nowadays, data are collected in vast amounts: Personal data is retrieved by our phones, by our internet browser, via our shopping behavior, and especially through all the content that we upload to social media platforms. Machines in factories, cars, essentially every device that is not purely mechanical anymore, may be also collecting data. This data is often used to build predictive models, e.g., for recommender systems or remaining lifetime estimation. As all things in life, the data and the knowledge extracted from a person or machine is subject to change, which is called concept drift. This concept drift may be caused by varying circumstances, changes in the expected outcome, or completely new requirements for the task. In any case, to keep a model operative, adaptive learning mechanisms are required to deal with the drift. Related works in this area cover a plethora of adaptive learning mechanisms. Usually, these algorithms are made to learn on streams of data from scratch. However, we argue that in many real-world scenarios this type of learning does not fit the actual application. It is rather, that stationary models are trained in a sandbox environment on large datasets, which are then put into practical use. If these models are not specifically constructed to be adaptive, any concept drift will lower the performance. Since training such a model, e.g., a deep neural network, can be expensive in regards of cost and time required, it is desirable to use it as long as possible. We introduce a new paradigm of adapting existing models. Our goal is to keep the existing models as long as possible, and only adapt it to the concept drift where it is necessary. We solve this by computing partial adaptations, so called patches. Via this mechanism, we can assure the existing model to live longer, and keep the learning required for adaptation to a minimum. The Patching mechanism elongates the lifetime of a machine learned model, helps to adapt with fewer observed instances, aids in individualizing an existing model, and generally increases the models’ cost efficiency. In this dissertation we first introduce a general framework for learn- ing patches as adaptation mechanisms. We evaluate the concept, and compare it against state of the art stream learning mechanisms. When dealing with normal stream scenarios, it is reasonable to apply Patch- ing. However, when dealing with scenarios which it is intended for, Patching excels in adaptation speed and overall performance. In a second contribution we specialize the patching idea on neural networks. Since neural networks are expensive and time consuming in training, we require a way of adapting them quickly. Although neural networks can be adapted via the normal training process, training them with newer data can lead to side effects such as catastrophic forgetting. Depending on the size and complexity of the network, adapting them can also be either expensive or—when given only few examples—unsuccessful. We propose neural network patching (NN- Patching) as a solution to this issue. In NN-Patching, the underlying network remains unchanged. However, a neural patch is trained by using the inner activations of the base network. These represent latent features that can be useful towards the given task. An error estimator network determines, whether the patch network or the base network is better suited to classify an instance. NN-Patching shows even more significant improvements than Patching, with quick adaptation and overall adaptive capabilities that rival those of the theoretically more capable competition. The final contribution is geared towards the use in scenarios that require model individualization or deal with re-occuring concepts. For this task we propose Ensemble Patching, a variant of Patching that builds an ensemble of patches. These patches are learned in such a way, that they each cover a distinctive type of concept drift. When a new concept emerges, a certain error pattern will occur for the base classifier. A specific patch is then learned. All ensemble members are managed via a recurrent network called the ensemble conductor. This separately trained model will conduct the ensemble decision, and is the key player for the adaptation. When concepts become outdated, the conductor will put less weight on the decisions of the respective patches, but by its structure it can quickly reactivate them, should older concepts become relevant again. Our evaluation demonstrates that this ensemble technique handles recurring concepts very well. Ensemble Patching can also be employed in a stream classification scenario, where computational efficiency is important. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-90890 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Knowledge Engineering 20 Fachbereich Informatik > Telekooperation |
||||
Hinterlegungsdatum: | 29 Sep 2019 19:55 | ||||
Letzte Änderung: | 29 Sep 2019 19:55 | ||||
PPN: | |||||
Referenten: | Fürnkranz, Prof. Dr. Johannes ; Mühlhäuser, Prof. Dr. Max ; Hammer, Prof. Dr. Barbara | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 6 September 2019 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |