TU Darmstadt / ULB / TUbiblio

Active vision as sequential decision-making under uncertainty

Kadner, Florian (2024)
Active vision as sequential decision-making under uncertainty.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00026598
Ph.D. Thesis, Primary publication, Publisher's Version

Abstract

Interacting with our visual environment can be challenging due to its highly dynamic nature and richness in complex interrelationships. With the human visual system's constraint of having a narrow field of high resolution, we must actively shift our attention between different visual areas to acquire relevant visual information to accomplish our tasks. Extracting this task-relevant information from our environment can be challenging and further amplified by our world’s inherently probabilistic nature. Sensory perception often presents ambiguities with varying results from identical measurements and vice versa. Similarly, the consequences of our actions are usually governed by uncertainty, which originates from several internal and external factors. Finally, the relevance of completing a particular task or even the definition of the task and its associated costs are highly variable across individuals. Thus, uncertainty is a fundamental factor at multiple stages while interacting with our visual environment. Sensory perception, decision-making, and actions are inseparably intertwined, and it is, therefore, all the more critical that we deal with the arising uncertainties and develop strategies to reduce them as far as possible. Computationally, this aligns with the concept of planning. In this thesis, we are investigating the active nature of visual planning as a probabilistic decision-making process under uncertainty. We designed various experimental paradigms to quantify sensory uncertainty, action variability, and the behavioral costs of human behavior in sequential visual tasks. For this purpose, we use the framework of Partially Observable Markov Decision Processes (POMDPs), which allow us to normatively model decision-making processes by incorporating different sources of uncertainty. Using three case studies, we demonstrate its use, advantages, and possibilities, starting with the most straightforward visual action - blinking. Even this simple action has to be planned since every blink briefly interrupts the visual information stream. We then move on to more complex visual actions such as saccades and gaze selection. First, we consider one-step ahead predictions in the context of free viewing and saliency models before moving on to a complex example of a gaze-contingent paradigm task where, in addition to observations, rewards are dynamic and uncertain. Last, we consider two other studies more detached from the experimental environment and devoted to more natural stimuli. We investigate how humans navigate mazes and their associated planning strategies of eye movements to find the solution. Also, we designed a reading experiment including an adaptive font system that maximizes the subjects' individual reading speed and thus reduces the underlying internal behavioral costs. Our results conclude that human visual behavior should be seen as an active sequential decision process under uncertainty where POMDPs can provide a powerful tool for modeling.

Item Type: Ph.D. Thesis
Erschienen: 2024
Creators: Kadner, Florian
Type of entry: Primary publication
Title: Active vision as sequential decision-making under uncertainty
Language: English
Referees: Rothkopf, Prof. Constantin A. ; Hayhoe, Prof. Mary M.
Date: 27 February 2024
Place of Publication: Darmstadt
Collation: viii, 159 Seiten
Refereed: 23 January 2024
DOI: 10.26083/tuprints-00026598
URL / URN: https://tuprints.ulb.tu-darmstadt.de/26598
Abstract:

Interacting with our visual environment can be challenging due to its highly dynamic nature and richness in complex interrelationships. With the human visual system's constraint of having a narrow field of high resolution, we must actively shift our attention between different visual areas to acquire relevant visual information to accomplish our tasks. Extracting this task-relevant information from our environment can be challenging and further amplified by our world’s inherently probabilistic nature. Sensory perception often presents ambiguities with varying results from identical measurements and vice versa. Similarly, the consequences of our actions are usually governed by uncertainty, which originates from several internal and external factors. Finally, the relevance of completing a particular task or even the definition of the task and its associated costs are highly variable across individuals. Thus, uncertainty is a fundamental factor at multiple stages while interacting with our visual environment. Sensory perception, decision-making, and actions are inseparably intertwined, and it is, therefore, all the more critical that we deal with the arising uncertainties and develop strategies to reduce them as far as possible. Computationally, this aligns with the concept of planning. In this thesis, we are investigating the active nature of visual planning as a probabilistic decision-making process under uncertainty. We designed various experimental paradigms to quantify sensory uncertainty, action variability, and the behavioral costs of human behavior in sequential visual tasks. For this purpose, we use the framework of Partially Observable Markov Decision Processes (POMDPs), which allow us to normatively model decision-making processes by incorporating different sources of uncertainty. Using three case studies, we demonstrate its use, advantages, and possibilities, starting with the most straightforward visual action - blinking. Even this simple action has to be planned since every blink briefly interrupts the visual information stream. We then move on to more complex visual actions such as saccades and gaze selection. First, we consider one-step ahead predictions in the context of free viewing and saliency models before moving on to a complex example of a gaze-contingent paradigm task where, in addition to observations, rewards are dynamic and uncertain. Last, we consider two other studies more detached from the experimental environment and devoted to more natural stimuli. We investigate how humans navigate mazes and their associated planning strategies of eye movements to find the solution. Also, we designed a reading experiment including an adaptive font system that maximizes the subjects' individual reading speed and thus reduces the underlying internal behavioral costs. Our results conclude that human visual behavior should be seen as an active sequential decision process under uncertainty where POMDPs can provide a powerful tool for modeling.

Alternative Abstract:
Alternative abstract Language

Die Interaktion mit unserer visuellen Umgebung kann aufgrund ihrer hochdynamischen Natur und der Fülle an komplexen Zusammenhängen eine Herausforderung darstellen. Da das menschliche visuelle System nur über eine schmale Region hoher Auflösung verfügt, müssen wir unsere Aufmerksamkeit aktiv zwischen verschiedenen visuellen Bereichen hin- und herbewegen. Diese Schwierigkeit, visuelle Informationen zur Bewältigung unserer Aufgaben aus unserer Umgebung zu extrahieren, wird durch die inhärent probabilistische Natur unserer Welt noch verstärkt, da Wahrnehmung oft nicht eindeutig ist. Auch die Folgen unserer Handlungen sind in der Regel mit Unsicherheit behaftet und selbst die Relevanz der Erledigung einer bestimmten Aufgabe oder sogar deren Definition und die damit verbundenen Kosten sind von Person zu Person sehr unterschiedlich. Unsicherheit ist also ein grundlegender Faktor in der Interaktion mit unserer visuellen Umgebung. Sinneswahrnehmung, Entscheidungsfindung und Handeln sind untrennbar miteinander verbunden, und deshalb ist es umso wichtiger, dass wir uns mit den entstehenden Unsicherheiten auseinandersetzen und Strategien dagegen entwickeln. Computational gesehen entspricht dies dem Konzept der Planung. In dieser Arbeit untersuchen wir visuelle Planung als einen probabilistischen Entscheidungsprozess unter Unsicherheit. Wir haben verschiedene experimentelle Paradigmen entwickelt, um die sensorische Unsicherheit, die Handlungsvariabilität und die Verhaltenskosten des menschlichen Verhaltens in sequenziellen visuellen Aufgaben zu quantifizieren. Zu diesem Zweck verwenden wir Partially Observable Markov Decision Processes (POMDPs), die es uns ermöglichen, Entscheidungsprozesse normativ zu modellieren und verschiedene Quellen der Unsicherheit einbeziehen. Dies demonstrieren wir anhand dreier Studien beginnend mit der einfachsten visuellen Handlung - dem Blinzeln. Selbst diese einfache Handlung muss geplant werden, da jedes Blinzeln den visuellen Informationsstrom kurz unterbricht. Anschließend gehen wir zu komplexeren visuellen Handlungen der Sakkaden über. Zunächst betrachten wir Ein-Schritt-Vorhersagen im Zusammenhang mit freiem Sehen und Salienzmodellen, bevor wir zu einem komplexen Beispiel übergehen, bei dem zusätzlich zu den Beobachtungen auch die Belohnungen dynamisch und unsicher sind. Zum Schluss betrachten wir zwei weitere Studien in natürlicheren Umgebungen. Wir untersuchen das Planungsverhalten von Menschen beim Lösen von Labyrinthen. Außerdem haben wir ein Leseexperiment mit einem adaptiven Schriftsystem entworfen, das die individuelle Lesegeschwindigkeit der Versuchspersonen maximiert. Unsere Ergebnisse lassen den Schluss zu, dass das menschliche Sehverhalten als aktiver sequentieller Entscheidungsprozess unter Unsicherheit betrachtet werden sollte, für dessen Modellierung POMDPs ein leistungsfähiges Werkzeug darstellen können.

German
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-265989
Additional Information:

In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Technical University Darmstadt’s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink. If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation.

Classification DDC: 100 Philosophy and psychology > 150 Psychology
Divisions: 03 Department of Human Sciences
03 Department of Human Sciences > Institute for Psychology
03 Department of Human Sciences > Institute for Psychology > Psychology of Information Processing
TU-Projects: DFG|RO4337/3-1|Aktives Sehen: Kontr
Date Deposited: 27 Feb 2024 13:20
Last Modified: 04 Mar 2024 20:50
PPN:
Referees: Rothkopf, Prof. Constantin A. ; Hayhoe, Prof. Mary M.
Refereed / Verteidigung / mdl. Prüfung: 23 January 2024
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details