Zhang, Shunan ; Yu, Angela J (2013)
Cheap but Clever: Human Active Learning in a Bandit Setting.
In: Proceedings of the Annual Meeting of the Cognitive Science Society, 35
Artikel, Bibliographie
Kurzbeschreibung (Abstract)
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy outcomes, is an important problem in cognitive science. There are two interrelated questions: how humans represent information, both what has been learned and what can still be learned, and how they choose actions, in particular how they negotiate the tension between exploration and exploitation. In this work, we examine human behavioral data in a multi-armed bandit setting, in which the subject choose one of four “arms” to pull on each trial and receives a binary outcome (win/lose). We implement both the Bayes-optimal policy, which maximizes the expected cumulative reward in this finite-horizon bandit environment, as well as a variety of heuristic policies that vary in their complexity of information representation and decision policy. We find that the knowledge gradient algorithm, which combines exact Bayesian learning with a decision policy that maximizes a combination of immediate reward gain and longterm knowledge gain, captures subjects’ trial-by-trial choice best among all the models considered; it also provides the best approximation to the computationally intense optimal policy among all the heuristic policies.
Typ des Eintrags: | Artikel |
---|---|
Erschienen: | 2013 |
Autor(en): | Zhang, Shunan ; Yu, Angela J |
Art des Eintrags: | Bibliographie |
Titel: | Cheap but Clever: Human Active Learning in a Bandit Setting |
Sprache: | Englisch |
Publikationsjahr: | 2013 |
Ort: | San Francisco, California |
Verlag: | PLOS |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | Proceedings of the Annual Meeting of the Cognitive Science Society |
Jahrgang/Volume einer Zeitschrift: | 35 |
URL / URN: | https://escholarship.org/uc/item/5xt5z4tv |
Kurzbeschreibung (Abstract): | How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy outcomes, is an important problem in cognitive science. There are two interrelated questions: how humans represent information, both what has been learned and what can still be learned, and how they choose actions, in particular how they negotiate the tension between exploration and exploitation. In this work, we examine human behavioral data in a multi-armed bandit setting, in which the subject choose one of four “arms” to pull on each trial and receives a binary outcome (win/lose). We implement both the Bayes-optimal policy, which maximizes the expected cumulative reward in this finite-horizon bandit environment, as well as a variety of heuristic policies that vary in their complexity of information representation and decision policy. We find that the knowledge gradient algorithm, which combines exact Bayesian learning with a decision policy that maximizes a combination of immediate reward gain and longterm knowledge gain, captures subjects’ trial-by-trial choice best among all the models considered; it also provides the best approximation to the computationally intense optimal policy among all the heuristic policies. |
Fachbereich(e)/-gebiet(e): | 03 Fachbereich Humanwissenschaften 03 Fachbereich Humanwissenschaften > Institut für Psychologie |
Hinterlegungsdatum: | 30 Okt 2023 09:10 |
Letzte Änderung: | 31 Okt 2023 06:50 |
PPN: | 512763356 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |