TU Darmstadt / ULB / TUbiblio

Cheap but Clever: Human Active Learning in a Bandit Setting

Zhang, Shunan ; Yu, Angela J (2013)
Cheap but Clever: Human Active Learning in a Bandit Setting.
In: Proceedings of the Annual Meeting of the Cognitive Science Society, 35
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy outcomes, is an important problem in cognitive science. There are two interrelated questions: how humans represent information, both what has been learned and what can still be learned, and how they choose actions, in particular how they negotiate the tension between exploration and exploitation. In this work, we examine human behavioral data in a multi-armed bandit setting, in which the subject choose one of four “arms” to pull on each trial and receives a binary outcome (win/lose). We implement both the Bayes-optimal policy, which maximizes the expected cumulative reward in this finite-horizon bandit environment, as well as a variety of heuristic policies that vary in their complexity of information representation and decision policy. We find that the knowledge gradient algorithm, which combines exact Bayesian learning with a decision policy that maximizes a combination of immediate reward gain and longterm knowledge gain, captures subjects’ trial-by-trial choice best among all the models considered; it also provides the best approximation to the computationally intense optimal policy among all the heuristic policies.

Typ des Eintrags: Artikel
Erschienen: 2013
Autor(en): Zhang, Shunan ; Yu, Angela J
Art des Eintrags: Bibliographie
Titel: Cheap but Clever: Human Active Learning in a Bandit Setting
Sprache: Englisch
Publikationsjahr: 2013
Ort: San Francisco, California
Verlag: PLOS
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Proceedings of the Annual Meeting of the Cognitive Science Society
Jahrgang/Volume einer Zeitschrift: 35
URL / URN: https://escholarship.org/uc/item/5xt5z4tv
Kurzbeschreibung (Abstract):

How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy outcomes, is an important problem in cognitive science. There are two interrelated questions: how humans represent information, both what has been learned and what can still be learned, and how they choose actions, in particular how they negotiate the tension between exploration and exploitation. In this work, we examine human behavioral data in a multi-armed bandit setting, in which the subject choose one of four “arms” to pull on each trial and receives a binary outcome (win/lose). We implement both the Bayes-optimal policy, which maximizes the expected cumulative reward in this finite-horizon bandit environment, as well as a variety of heuristic policies that vary in their complexity of information representation and decision policy. We find that the knowledge gradient algorithm, which combines exact Bayesian learning with a decision policy that maximizes a combination of immediate reward gain and longterm knowledge gain, captures subjects’ trial-by-trial choice best among all the models considered; it also provides the best approximation to the computationally intense optimal policy among all the heuristic policies.

Fachbereich(e)/-gebiet(e): 03 Fachbereich Humanwissenschaften
03 Fachbereich Humanwissenschaften > Institut für Psychologie
Hinterlegungsdatum: 30 Okt 2023 09:10
Letzte Änderung: 31 Okt 2023 06:50
PPN: 512763356
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen