TU Darmstadt / ULB / TUbiblio

Reward-Rate Maximization in Sequential Identification under a Stochastic Deadline

Dayanik, Savas ; Yu, Angela J. (2013)
Reward-Rate Maximization in Sequential Identification under a Stochastic Deadline.
In: SIAM Journal on Control and Optimization, 51 (4)
doi: 10.1137/100818005
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

Any intelligent system performing evidence-based decision making under time pressure must negotiate a speed-accuracy trade-off. In computer science and engineering, this is typically modeled as minimizing a Bayes-risk functional that is a linear combination of expected decision delay and expected terminal decision loss. In neuroscience and psychology, however, it is often modeled as maximizing the long-term reward rate, or the ratio of expected terminal reward and expected decision delay. The two approaches have opposing advantages and disadvantages. While Bayes-risk minimization can be solved with powerful dynamic programming techniques unlike reward-rate maximization, it also requires the explicit specification of the relative costs of decision delay and error, which is obviated by reward-rate maximization. Here, we demonstrate that, for a large class of sequential multihypothesis identification problems under a stochastic deadline, the reward-rate maximization is equivalent to a special case of Bayes-risk minimization, in which the optimal policy that attains the minimal risk when the unit sampling cost is exactly the maximal reward rate is also the policy that attains maximal reward rate. We show that the maximum reward rate is the unique unit sampling cost for which the expected total observation cost and expected terminal reward break even under every Bayes-risk optimal decision rule. This interplay between reward-rate maximization and Bayesrisk minimization formulations allows us to show that maximum reward rate is always attained. We can compute the policy that maximizes reward rate by solving an inverse Bayes-risk minimization problem, whereby we know the Bayes risk of the optimal policy and need to find the associated unit sampling cost parameter. Leveraging this equivalence, we derive an iterative dynamic programming procedure for solving the reward-rate maximization problem exponentially fast, thus incorporating the advantages of both the reward-rate maximization and Bayes-risk minimization formulations. As an illustration, we will apply the procedure to a two-hypothesis identification example.

Typ des Eintrags: Artikel
Erschienen: 2013
Autor(en): Dayanik, Savas ; Yu, Angela J.
Art des Eintrags: Bibliographie
Titel: Reward-Rate Maximization in Sequential Identification under a Stochastic Deadline
Sprache: Englisch
Publikationsjahr: Januar 2013
Ort: Philadelphia
Verlag: Society for Industrial and Applied Mathematics
Titel der Zeitschrift, Zeitung oder Schriftenreihe: SIAM Journal on Control and Optimization
Jahrgang/Volume einer Zeitschrift: 51
(Heft-)Nummer: 4
DOI: 10.1137/100818005
URL / URN: http://epubs.siam.org/doi/10.1137/100818005
Kurzbeschreibung (Abstract):

Any intelligent system performing evidence-based decision making under time pressure must negotiate a speed-accuracy trade-off. In computer science and engineering, this is typically modeled as minimizing a Bayes-risk functional that is a linear combination of expected decision delay and expected terminal decision loss. In neuroscience and psychology, however, it is often modeled as maximizing the long-term reward rate, or the ratio of expected terminal reward and expected decision delay. The two approaches have opposing advantages and disadvantages. While Bayes-risk minimization can be solved with powerful dynamic programming techniques unlike reward-rate maximization, it also requires the explicit specification of the relative costs of decision delay and error, which is obviated by reward-rate maximization. Here, we demonstrate that, for a large class of sequential multihypothesis identification problems under a stochastic deadline, the reward-rate maximization is equivalent to a special case of Bayes-risk minimization, in which the optimal policy that attains the minimal risk when the unit sampling cost is exactly the maximal reward rate is also the policy that attains maximal reward rate. We show that the maximum reward rate is the unique unit sampling cost for which the expected total observation cost and expected terminal reward break even under every Bayes-risk optimal decision rule. This interplay between reward-rate maximization and Bayesrisk minimization formulations allows us to show that maximum reward rate is always attained. We can compute the policy that maximizes reward rate by solving an inverse Bayes-risk minimization problem, whereby we know the Bayes risk of the optimal policy and need to find the associated unit sampling cost parameter. Leveraging this equivalence, we derive an iterative dynamic programming procedure for solving the reward-rate maximization problem exponentially fast, thus incorporating the advantages of both the reward-rate maximization and Bayes-risk minimization formulations. As an illustration, we will apply the procedure to a two-hypothesis identification example.

Zusätzliche Informationen:

6 citations (Crossref) 2023-10-13

Fachbereich(e)/-gebiet(e): 03 Fachbereich Humanwissenschaften
03 Fachbereich Humanwissenschaften > Institut für Psychologie
Hinterlegungsdatum: 30 Okt 2023 13:31
Letzte Änderung: 31 Okt 2023 06:48
PPN: 512773025
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen