Zhou, Corey Yishan ; Guo, Dalin ; Yu, Angela J. (2020)
Devaluation of Unchosen Options: A Bayesian Account of the Provenance and Maintenance of Overly Optimistic Expectations.
42nd Annual Meeting of the Cognitive Science Society (CogSci 2020). Online (ursprünglich Toronto, Canada) (29.07.2020-01.08.2020)
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Humans frequently overestimate the likelihood of desirable events while underestimating the likelihood of undesirable ones: a phenomenon known as unrealistic optimism. Previously, it was suggested that unrealistic optimism arises from asymmetric belief updating, with a relatively reduced coding of undesirable information. Prior studies have shown that a reinforcement learning (RL) model with asymmetric learning rates (greater for a positive prediction error than a negative prediction error) could account for unrealistic optimism in a bandit task, in particular the tendency of human subjects to persistently choosing a single option when there are multiple equally good options. Here, we propose an alternative explanation of such persistent behavior, by modeling human behavior using a Bayesian hidden Markov model, the Dynamic Belief Model (DBM). We find that DBM captures human choice behavior better than the previously proposed asymmetric RL model. Whereas asymmetric RL attains a measure of optimism by giving better-than-expected outcomes higher learning weights compared to worse-than-expected outcomes, DBM does so by progressively devaluing the unchosen options, thus placing a greater emphasis on choice history independent of reward outcome (e.g. an oft-chosen option might continue to be preferred even if it has not been particularly rewarding), which has broadly been shown to underlie sequential effects in a variety of behavioral settings. Moreover, previous work showed that the devaluation of unchosen options in DBM helps to compensate for a default assumption of environmental non-stationarity, thus allowing the decision-maker to both be more adaptive in changing environments and still obtain near-optimal performance in stationary environments. Thus, the current work suggests both a novel rationale and mechanism for persistent behavior in bandit tasks.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2020 |
Autor(en): | Zhou, Corey Yishan ; Guo, Dalin ; Yu, Angela J. |
Art des Eintrags: | Bibliographie |
Titel: | Devaluation of Unchosen Options: A Bayesian Account of the Provenance and Maintenance of Overly Optimistic Expectations |
Sprache: | Englisch |
Publikationsjahr: | 2020 |
Ort: | Red Hook, NY |
Verlag: | Curran Associates, Inc. |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020) |
Jahrgang/Volume einer Zeitschrift: | 42 |
Buchtitel: | Proceedings of the Annual Meeting of the Cognitive Science Society |
Veranstaltungstitel: | 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020) |
Veranstaltungsort: | Online (ursprünglich Toronto, Canada) |
Veranstaltungsdatum: | 29.07.2020-01.08.2020 |
URL / URN: | https://cognitivesciencesociety.org/wp-content/uploads/2022/... |
Kurzbeschreibung (Abstract): | Humans frequently overestimate the likelihood of desirable events while underestimating the likelihood of undesirable ones: a phenomenon known as unrealistic optimism. Previously, it was suggested that unrealistic optimism arises from asymmetric belief updating, with a relatively reduced coding of undesirable information. Prior studies have shown that a reinforcement learning (RL) model with asymmetric learning rates (greater for a positive prediction error than a negative prediction error) could account for unrealistic optimism in a bandit task, in particular the tendency of human subjects to persistently choosing a single option when there are multiple equally good options. Here, we propose an alternative explanation of such persistent behavior, by modeling human behavior using a Bayesian hidden Markov model, the Dynamic Belief Model (DBM). We find that DBM captures human choice behavior better than the previously proposed asymmetric RL model. Whereas asymmetric RL attains a measure of optimism by giving better-than-expected outcomes higher learning weights compared to worse-than-expected outcomes, DBM does so by progressively devaluing the unchosen options, thus placing a greater emphasis on choice history independent of reward outcome (e.g. an oft-chosen option might continue to be preferred even if it has not been particularly rewarding), which has broadly been shown to underlie sequential effects in a variety of behavioral settings. Moreover, previous work showed that the devaluation of unchosen options in DBM helps to compensate for a default assumption of environmental non-stationarity, thus allowing the decision-maker to both be more adaptive in changing environments and still obtain near-optimal performance in stationary environments. Thus, the current work suggests both a novel rationale and mechanism for persistent behavior in bandit tasks. |
Fachbereich(e)/-gebiet(e): | 03 Fachbereich Humanwissenschaften 03 Fachbereich Humanwissenschaften > Institut für Psychologie |
Hinterlegungsdatum: | 06 Nov 2023 13:53 |
Letzte Änderung: | 07 Nov 2023 12:04 |
PPN: | 512974039 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |