Flynn, Hamish (2023)
PAC-Bayesian Bandit Algorithms With Guarantees.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00024778
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
PAC-Bayes is a mathematical framework that can be used to provide performance guarantees for machine learning algorithms, explain why specific machine learning algorithms work well, and design new machine learning algorithms. Since the first PAC-Bayesian theorems were proven in the late 1990's, several impressive milestones have been achieved. PAC-Bayes generalisation bounds have been used to prove tight error bounds for deep neural networks. In addition, PAC-Bayes bounds have been used to explain why machine learning principles such as large margin classification and preference for flat minima of a loss function work well. However, these milestones were achieved in simple supervised learning problems.
In this thesis, inspired by the success of the PAC-Bayes framework in supervised learning settings, we investigate the potential of the PAC-Bayes framework as a tool for designing and analysing bandit algorithms.
First, we provide a comprehensive overview of PAC-Bayes bounds for bandit problems and an experimental comparison of these bounds. Previous works focused on PAC-Bayes bounds for martingales and their application to importance sampling-based estimates of the reward or regret of a policy. On the one hand, we found that these PAC-Bayes bounds are a useful tool for designing offline policy search algorithms with performance guarantees. In our experiments, a PAC-Bayesian offline policy search algorithm was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees. On the other hand, the PAC-Bayesian online policy search algorithms that we tested had underwhelming performance and loose cumulative regret bounds.
Next, we present novel PAC-Bayes-style algorithms with worst-case regret bounds for linear bandit problems. We combine PAC-Bayes bounds with the "optimism in the face of uncertainty" principle, which reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. We use a novel PAC-Bayes-style tail bound for adaptive martingale mixtures to construct convex PAC-Bayes-style confidence sequences for (sparse) linear bandits. We show that (sparse) linear bandit algorithms based on our PAC-Bayes-style confidence sequences are guaranteed to achieve competitive worst-case regret. We also show that our confidence sequences yield confidence bounds that are tighter than competitors, both empirically and theoretically. Finally, we demonstrate that our tighter PAC-Bayes-style confidence bounds result in bandit algorithms with improved cumulative regret.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2023 | ||||
Autor(en): | Flynn, Hamish | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | PAC-Bayesian Bandit Algorithms With Guarantees | ||||
Sprache: | Englisch | ||||
Referenten: | Peters, Prof. Jan ; Kandemir, Prof. Melih ; Guedj, Prof. Benjamin | ||||
Publikationsjahr: | 20 November 2023 | ||||
Ort: | Darmstadt | ||||
Kollation: | ix, 167 Seiten | ||||
Datum der mündlichen Prüfung: | 18 Oktober 2023 | ||||
DOI: | 10.26083/tuprints-00024778 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/24778 | ||||
Kurzbeschreibung (Abstract): | PAC-Bayes is a mathematical framework that can be used to provide performance guarantees for machine learning algorithms, explain why specific machine learning algorithms work well, and design new machine learning algorithms. Since the first PAC-Bayesian theorems were proven in the late 1990's, several impressive milestones have been achieved. PAC-Bayes generalisation bounds have been used to prove tight error bounds for deep neural networks. In addition, PAC-Bayes bounds have been used to explain why machine learning principles such as large margin classification and preference for flat minima of a loss function work well. However, these milestones were achieved in simple supervised learning problems. In this thesis, inspired by the success of the PAC-Bayes framework in supervised learning settings, we investigate the potential of the PAC-Bayes framework as a tool for designing and analysing bandit algorithms. First, we provide a comprehensive overview of PAC-Bayes bounds for bandit problems and an experimental comparison of these bounds. Previous works focused on PAC-Bayes bounds for martingales and their application to importance sampling-based estimates of the reward or regret of a policy. On the one hand, we found that these PAC-Bayes bounds are a useful tool for designing offline policy search algorithms with performance guarantees. In our experiments, a PAC-Bayesian offline policy search algorithm was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees. On the other hand, the PAC-Bayesian online policy search algorithms that we tested had underwhelming performance and loose cumulative regret bounds. Next, we present novel PAC-Bayes-style algorithms with worst-case regret bounds for linear bandit problems. We combine PAC-Bayes bounds with the "optimism in the face of uncertainty" principle, which reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. We use a novel PAC-Bayes-style tail bound for adaptive martingale mixtures to construct convex PAC-Bayes-style confidence sequences for (sparse) linear bandits. We show that (sparse) linear bandit algorithms based on our PAC-Bayes-style confidence sequences are guaranteed to achieve competitive worst-case regret. We also show that our confidence sequences yield confidence bounds that are tighter than competitors, both empirically and theoretically. Finally, we demonstrate that our tighter PAC-Bayes-style confidence bounds result in bandit algorithms with improved cumulative regret. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-247787 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Intelligente Autonome Systeme |
||||
Hinterlegungsdatum: | 20 Nov 2023 14:40 | ||||
Letzte Änderung: | 21 Nov 2023 09:44 | ||||
PPN: | |||||
Referenten: | Peters, Prof. Jan ; Kandemir, Prof. Melih ; Guedj, Prof. Benjamin | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 18 Oktober 2023 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |