TU Darmstadt / ULB / TUbiblio

In Theory and Practice - On the Rate of Convergence of Implementable Neural Network Regression Estimates

Braun, Alina (2021)
In Theory and Practice - On the Rate of Convergence of Implementable Neural Network Regression Estimates.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00019052
Ph.D. Thesis, Primary publication, Publisher's Version

Abstract

In theory, recent results in nonparametric regression show that neural network estimates are able to achieve good rates of convergence provided suitable assumptions on the structure of the regression function are imposed. However, these theoretical analyses cannot explain the practical success of neural networks since the theoretically studied estimates are defined by minimizing the empirical L_2 risk over a class of neural networks and in practice, solving this kind of minimization problem is not feasible. Consequently, the neural networks examined in theory cannot be implemented as they are defined. This means that neural network in applications differ from the ones that are analyzed theoretically.

In this thesis we narrow the gap between theory and practice. We deal with neural network regression estimates for (p,C)-smooth regression functions m that satisfy a projection pursuit model. We construct three implementable neural network estimates and show that each of them achieve up to a logarithmic factor the optimal univariate rate of convergence.

Firstly, for univariate regression functions with p contained in [-1/2,1] we construct a neural network estimate with one hidden layer where the weights are learned via gradient descent. The starting weights are randomly chosen from an interval independently of the data. The interval is large enough to guarantee that the estimate is close to a piecewise constant approximation.

Secondly, for multivariate regression functions with p contained in (0,1] we construct a neural network estimate with one hidden layer where the weights are learned via gradient descent. The initial weights are chosen from specific intervals dependently on the data and the projection directions. This choice guarantees that the estimate is close to a piecewise constant approximation. The projection directions are repeatedly chosen randomly.

Lastly, for multivariate regression functions with p>0 we construct a multilayer neural network estimate. The value of the inner weights are prescribed dependently on the projection directions by a new approximation result for a projection pursuit model by piecewise polynomials. The outer weights are chosen by solving a linear equation system. The projection directions are repeatedly chosen randomly.

Since we are able to show a rate of convergence that is independent of the dimension of the data our second and third estimates are able to circumvent the curse of dimensionality.

Item Type: Ph.D. Thesis
Erschienen: 2021
Creators: Braun, Alina
Type of entry: Primary publication
Title: In Theory and Practice - On the Rate of Convergence of Implementable Neural Network Regression Estimates
Language: English
Referees: Kohler, Prof. Dr. Michael ; Betz, Prof. Dr. Volker
Date: 2021
Place of Publication: Darmstadt
Collation: x, 219 Seiten
Refereed: 25 June 2021
DOI: 10.26083/tuprints-00019052
URL / URN: https://tuprints.ulb.tu-darmstadt.de/19052
Abstract:

In theory, recent results in nonparametric regression show that neural network estimates are able to achieve good rates of convergence provided suitable assumptions on the structure of the regression function are imposed. However, these theoretical analyses cannot explain the practical success of neural networks since the theoretically studied estimates are defined by minimizing the empirical L_2 risk over a class of neural networks and in practice, solving this kind of minimization problem is not feasible. Consequently, the neural networks examined in theory cannot be implemented as they are defined. This means that neural network in applications differ from the ones that are analyzed theoretically.

In this thesis we narrow the gap between theory and practice. We deal with neural network regression estimates for (p,C)-smooth regression functions m that satisfy a projection pursuit model. We construct three implementable neural network estimates and show that each of them achieve up to a logarithmic factor the optimal univariate rate of convergence.

Firstly, for univariate regression functions with p contained in [-1/2,1] we construct a neural network estimate with one hidden layer where the weights are learned via gradient descent. The starting weights are randomly chosen from an interval independently of the data. The interval is large enough to guarantee that the estimate is close to a piecewise constant approximation.

Secondly, for multivariate regression functions with p contained in (0,1] we construct a neural network estimate with one hidden layer where the weights are learned via gradient descent. The initial weights are chosen from specific intervals dependently on the data and the projection directions. This choice guarantees that the estimate is close to a piecewise constant approximation. The projection directions are repeatedly chosen randomly.

Lastly, for multivariate regression functions with p>0 we construct a multilayer neural network estimate. The value of the inner weights are prescribed dependently on the projection directions by a new approximation result for a projection pursuit model by piecewise polynomials. The outer weights are chosen by solving a linear equation system. The projection directions are repeatedly chosen randomly.

Since we are able to show a rate of convergence that is independent of the dimension of the data our second and third estimates are able to circumvent the curse of dimensionality.

Alternative Abstract:
Alternative abstract Language

Theoretische Resultate in der Nichtparametrischen Regressionsschätzung zeigen, dass unter geeigneten Annahmen an die Regressionsfunktion neuronale Netze Schätzer gute Konvergenzraten erreichen. Jedoch werden die dort untersuchten neuronalen Netze durch ein nicht praktikables Minimierungsproblem des empirischen L_2 Risikos über einer Klasse von neuronalen Netzen definiert. Folglich können diese theoretisch untersuchten neuronalen Netze nicht so implementiert werden, wie sie definiert werden. Also unterscheiden sich die in der Praxis verwendeten neuronalen Netze von den in der Theorie behandelten.

In dieser Thesis verringern wir diese Kluft zwischen praktisch angewandten und theoretisch untersuchten neuronalen Netzen. Wir befassen uns mit neuronale Netze Schätzern für (p,C)-glatte Regressionsfunktionen m, die das Projection Pursuit Modell erfüllen. Wir konstruieren drei implementierbare neuronale Netze Schätzer und zeigen, dass diese bis auf einen logarithmischen Faktor die optimale univariate Konvergenzrate erreichen.

Zuerst konstruieren wir für univariate Regressionsfunktionen mit p in [-1/2,1] einen neuronalen Netze Schätzer mit einer verdeckten Schicht, in dem die Gewichte durch das Gradientenabstiegsverfahren gelernt werden. Die Startgewichte werden zufällig aus einem Intervall gewählt, das groß genug ist, um zu garantieren, dass unser Schätzer nahe an einer stückweisen konstanten Approximation ist.

Danach konstruieren wir für multivariate Regressionsfunktionen mit p in (0,1] einen neuronalen Netze Schätzer mit einer verdeckten Schicht, in dem die Gewichte durch das Gradientenabstiegsverfahren gelernt werden. Die Startgewichte werden aus speziellen Intervallen, abhängig von den Daten und der Projektionsrichtungen gewählt. Diese Wahl garantiert, dass unser Schätzer nahe an einer stückweisen konstanten Approximation ist. Die Projektionrichtungen werden wiederholt zufällig gewählt.

Zuletzt konstruieren wir für multivariate Regressionsfunktionen mit p>0 einen neuronalen Netze Schätzer mit vielen verdeckten Schichten. Die inneren Gewichte werden durch ein neues Approximationsresultat für das Projection Pursuit Modell durch stückweise Polynome vorgegeben. Die äußeren Gewichte werden durch Lösen eines linearen Gleichungssystems bestimmt. Die Projektionrichtungen werden wiederholt zufällig gewählt.

Da wir eine von der Dimension der Daten unabhängige Konvergenzrate zeigen, können unser zweiter und unser dritter Schätzer den Fluch der Dimensionalität umgehen.

German
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-190528
Classification DDC: 500 Science and mathematics > 510 Mathematics
Divisions: 04 Department of Mathematics
04 Department of Mathematics > Stochastik
Date Deposited: 11 Aug 2021 08:47
Last Modified: 16 Aug 2021 07:42
PPN:
Referees: Kohler, Prof. Dr. Michael ; Betz, Prof. Dr. Volker
Refereed / Verteidigung / mdl. Prüfung: 25 June 2021
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details