Pretreatment of Environmental Data for Forecasting Purposes

Rüppel, Uwe ; Göbel, Peter
Hrsg.: Möller, Andreas ; Page, Bernd ; Schreiber, Martin (2008)
Pretreatment of Environmental Data for Forecasting Purposes.
22nd International Conference on Informatics for Environmental Protection (EnviroInfo). Lüneburg, Germany (10.-12. September 2008)
Konferenzveröffentlichung, Bibliographie

URL / URN: http://enviroinfo.eu/sites/default/files/pdfs/vol116/0569.pd...

Kurzbeschreibung (Abstract)

To assess present actions on the environment, it is necessary to estimate its impact in the future. Niels Bohr2 recognized, “prediction is very difficult, especially about the future”. Fortunately, “the future is made of the same stuff as the present” (Simone Weil3). This holds the fundamental possibility to forecast. The present is described with data. To draw the right conclusion about the future, the data need to be significant, correct and complete. This work is part of a project for an active control of groundwater levels. In this project, expected groundwater levels are being prognosticated according to varying infiltration masses using Artificial Neural Networks (ANN). Thus, an adequate infiltration quantity will be identified in order to reach the desired groundwater level. Before the environmental data are suitable for the actual forecast purpose, they need to undergo a wide range of pretreatments. These efforts are being described within this paper. In a first step, substitution methods will be presented to impute missing data. Basically, these methods can be divided into two branches. One category with correlation and kriging methods which use related measuring data sets, i.e. data sets of a nearby measuring station for e.g. groundwater level, temperature, rainfall etc.. The other category that uses only the one regarded data set consists of mere statistical methods that are spline interpolation, time-series forecasts and multiple imputations. In a second step, the completed data sets need to be freed from gross errors. For that reason different test criteria like bound checking, comparisons of spacings and different statistical methods are implemented. Furthermore, the original dynamic and time-variant data sets are compared with computed data sets, generated with time series analysis models. Outliers are indicated if computed values strongly diverge from original values. In doubtful situations the current curve can be compared with a curve of a correlative data set, if available. In a third step, in terms of a complexity reduction, the number of the relevant data that serve as input parameters for the ANN need to be reduced without losing the necessary information to make predictions. This is important because in the present case the number of necessary input parameters is too high in comparison to the number of training sets to train the ANN. Different statistical approaches will be discussed, like moving averages, time-weighted transformations and a method to combine sets of moving averages to reduce the number of input parameters of the ANN with consistent information content.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2008
Herausgeber:	Möller, Andreas ; Page, Bernd ; Schreiber, Martin
Autor(en):	Rüppel, Uwe ; Göbel, Peter
Art des Eintrags:	Bibliographie
Titel:	Pretreatment of Environmental Data for Forecasting Purposes
Sprache:	Englisch
Publikationsjahr:	September 2008
Ort:	Aachen
Verlag:	Shaker Verlag
Buchtitel:	EnviroInfo 2008: Environmental Informatics and Industrial Ecology
Veranstaltungstitel:	22nd International Conference on Informatics for Environmental Protection (EnviroInfo)
Veranstaltungsort:	Lüneburg, Germany
Veranstaltungsdatum:	10.-12. September 2008
URL / URN:	http://enviroinfo.eu/sites/default/files/pdfs/vol116/0569.pd...
Kurzbeschreibung (Abstract):	To assess present actions on the environment, it is necessary to estimate its impact in the future. Niels Bohr2 recognized, “prediction is very difficult, especially about the future”. Fortunately, “the future is made of the same stuff as the present” (Simone Weil3). This holds the fundamental possibility to forecast. The present is described with data. To draw the right conclusion about the future, the data need to be significant, correct and complete. This work is part of a project for an active control of groundwater levels. In this project, expected groundwater levels are being prognosticated according to varying infiltration masses using Artificial Neural Networks (ANN). Thus, an adequate infiltration quantity will be identified in order to reach the desired groundwater level. Before the environmental data are suitable for the actual forecast purpose, they need to undergo a wide range of pretreatments. These efforts are being described within this paper. In a first step, substitution methods will be presented to impute missing data. Basically, these methods can be divided into two branches. One category with correlation and kriging methods which use related measuring data sets, i.e. data sets of a nearby measuring station for e.g. groundwater level, temperature, rainfall etc.. The other category that uses only the one regarded data set consists of mere statistical methods that are spline interpolation, time-series forecasts and multiple imputations. In a second step, the completed data sets need to be freed from gross errors. For that reason different test criteria like bound checking, comparisons of spacings and different statistical methods are implemented. Furthermore, the original dynamic and time-variant data sets are compared with computed data sets, generated with time series analysis models. Outliers are indicated if computed values strongly diverge from original values. In doubtful situations the current curve can be compared with a curve of a correlative data set, if available. In a third step, in terms of a complexity reduction, the number of the relevant data that serve as input parameters for the ANN need to be reduced without losing the necessary information to make predictions. This is important because in the present case the number of necessary input parameters is too high in comparison to the number of training sets to train the ANN. Different statistical approaches will be discussed, like moving averages, time-weighted transformations and a method to combine sets of moving averages to reduce the number of input parameters of the ANN with consistent information content.
Zusätzliche Informationen:	ISBN: 978-3-8322-7313-2
Fachbereich(e)/-gebiet(e):	13 Fachbereich Bau- und Umweltingenieurwissenschaften 13 Fachbereich Bau- und Umweltingenieurwissenschaften > Institut für Numerische Methoden und Informatik im Bauwesen
Hinterlegungsdatum:	21 Jan 2015 13:15
Letzte Änderung:	04 Jan 2021 14:06
PPN:
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung