Salikutluk, Vildan (2024)
Human Problem-Solving with Interactive Artificial Intelligence.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00028908
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Humans constantly have to solve complex problems, often with uncertain and incomplete information. Finding adequate strategies to solve different types of problems is a hallmark of human intelligence. While this ability allows humans to navigate many (unknown) challenges, humans can still experience difficulties during problem-solving and can likely benefit from well-designed support tools. Recent artificial intelligence (AI) systems offer possibilities to aid humans in many tasks. Especially if the strengths of humans and AI are combined, there is great potential for improved performance and solutions. However, it is not always clear how to design such complementary human-AI interaction. Using a human-centered approach is promising, as it helps us understand how humans solve different problems and where AI can best support them. This enables us to tailor interactions and AI design to the user. To achieve this, we must consider the features of the problems and how humans solve them. Importantly, investigating the cognitive processes and solution steps of humans is crucial, not only to identify their limitations in different problem-solving settings, but also to design AI tools that are useful and well-integrated into these processes.
The focus of this thesis is to examine how humans solve different types of problems with interactive AI systems. We use a mixed-methods approach to obtain qualitative insights about underlying cognitive processes and quantitative data about human behavior, performance, and confidence during problem-solving. These results provide insights to understand what is important in both well-defined and ill-defined problems. Furthermore, we can investigate what happens when appropriate AI systems are employed to potentially support humans during their problem-solving process. To examine such human-AI interaction, we conduct several empirical studies. In the first one, a human and AI agent have to collaboratively solve a well-defined problem. This means, they solve a task together, in which all steps and sub-tasks that need to be completed are known. In this study, the overall performance is influenced by the coordination of sub-tasks. This coordination entails who solves a particular sub-task and in which order all of them are completed. Thus, we examine how humans coordinate with an AI agent. To do this, we designed our experimental task to include sub-tasks that can either be solved by only the human, only the agent, or both. Some sub-tasks have interdependent steps as well. Therefore, the interaction and coordination have a substantial influence on how efficient and well the human-AI team (HAT) performs. In such settings, the aspect of AI autonomy is crucial: Determining who handles each sub-task and how they are solved efficiently depends on how interactions and communication are initiated and carried out between humans and AI agents. Thus, we empirically investigate the impact of AI autonomy on HAT performance and user satisfaction in a cooperative task in a simulated shared workspace. Specifically, we compare fixed AI autonomy levels with situation-dependent autonomy adaptation. We find that HATs performed best when the AI adjusted its autonomy based on the current situation. Users also rated this agent highest in terms of perceived intelligence. Our findings highlight the benefits of adaptive AI autonomy in settings where humans solve such a well-defined problem together with an AI agent.
Furthermore, we explore how humans solve an example task for ill-defined problems. Specifically, we investigate guesstimation, i.e., the estimation of unknown quantities from incomplete or highly uncertain information. Guesstimation problems are ill-defined since multiple approaches are possible, and often it is not even clear how to evaluate the quality of solutions. If it is not possible to determine the quality of the solution in experiments, however, it becomes very hard to investigate the performance in such tasks. To address this, we devised guesstimation problems across a wide range of domains to which we know the answers, but participants in our study could not know or find out directly. Using these questions allowed us to analyze the problem-solving process systematically with a mixed-methods approach. We examined our participants’ underlying solution processes with qualitative data by collecting think-aloud protocols during guesstimation. With such rich data, we were able to identify their solution strategies and how they approach these problems. In addition, we collected quantitative measures for their performance and confidence about their answers. We found that participants solved guesstimation problems reasonably well. They decomposed the questions into sub-questions and often transformed them into semantically related ones that were easier to answer. However, this is also where impasses frequently occurred: often they were unable to brainstorm semantic transformations and got stuck, leading them to simply guess an answer. To address this impasse, we provided another AI system. We prompted a Large Language Model (LLM), such that it was able to provide ideas for transformations during this brainstorming process within guesstimation. We then tested the impact of such an AI tool’s availability on task performance. Thus, we not only identified guesstimation as a promising testbed for studying human-AI interaction in ill-defined problem-solving settings, but also provide in-depth evaluations. While the tool successfully produced human-like suggestions, participants were reluctant to use it. Because of this, we found no significant difference in the participants’ performance based on the tool’s availability. Given our results, we reflect on why LLMs are not (yet) capable to significantly increase performance in these kinds of tasks. We discuss why the design of AI tools for such cognitive support is not trivial, but also point to promising directions for future work.
We also observed that the LLM we used as a brainstorming tool sometimes generated outputs containing harmful biases, for instance, when the guesstimation questions included references to certain regions of the world. To ensure that AI systems are human-centered, we need to not only integrate them well into the cognitive processes of problem-solvers, but also make them fair and prevent them from causing harm. This will be especially critical if such tools are used for guesstimation tasks in the real world, like (geo-)political forecasting. We therefore investigate biases in LLMs systematically. For this study, we focus on whether different state-of-the-art LLMs show biases in terms of gender and religion. Our findings show that (intersectional) biases are indeed present in all LLMs we tested – even despite many debiasing efforts. The LLMs are still significantly more likely to produce outputs that are in line with harmful stereotypes against marginalized groups. Therefore, we discuss what it would mean to employ these systems in real-world problem-solving settings, and what measures could be used to uncover and ultimately improve the unfair outputs of LLMs.
In summary, this thesis deals with the investigation of human problem-solving with interactive AI systems. We show that different problem types, i.e., well-defined and ill-defined ones, require different considerations in terms of AI support and the interaction with such systems to ensure a human-centered approach. We empirically test what humans need and prefer, as well as how they coordinate with agents while they solve a well-defined problem. We also explore ill-defined problem-solving with AI in the case of guesstimation. We examine how humans approach and solve guesstimation problems, which informed how we apply AI support to be most promising. This approach takes into account both the needs of the human and the capabilities of current AI systems, such as LLMs. Thus, we not only identify guesstimation as a suitable case for potential complementarity by combining the strengths of humans and AI systems, but also investigate it in-depth. Generally, in both our well-defined and ill-defined problem-solving settings, we observe advantages and shortcomings of the human-AI interaction. We discuss the factors influencing the task performance and interaction in each setting, and which future directions are promising. We present how our findings and perspective of combining cognitive science and interaction research can further improve upon our understanding and, ultimately, the design of fair and beneficial human-AI interaction for problem-solving.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2024 | ||||
Autor(en): | Salikutluk, Vildan | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Human Problem-Solving with Interactive Artificial Intelligence | ||||
Sprache: | Englisch | ||||
Referenten: | Jäkel, Prof. Dr. Frank ; Chuang, Prof. Dr. Lewis | ||||
Publikationsjahr: | 20 Dezember 2024 | ||||
Ort: | Darmstadt | ||||
Kollation: | 136 Seiten | ||||
Datum der mündlichen Prüfung: | 9 Dezember 2024 | ||||
DOI: | 10.26083/tuprints-00028908 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/28908 | ||||
Kurzbeschreibung (Abstract): | Humans constantly have to solve complex problems, often with uncertain and incomplete information. Finding adequate strategies to solve different types of problems is a hallmark of human intelligence. While this ability allows humans to navigate many (unknown) challenges, humans can still experience difficulties during problem-solving and can likely benefit from well-designed support tools. Recent artificial intelligence (AI) systems offer possibilities to aid humans in many tasks. Especially if the strengths of humans and AI are combined, there is great potential for improved performance and solutions. However, it is not always clear how to design such complementary human-AI interaction. Using a human-centered approach is promising, as it helps us understand how humans solve different problems and where AI can best support them. This enables us to tailor interactions and AI design to the user. To achieve this, we must consider the features of the problems and how humans solve them. Importantly, investigating the cognitive processes and solution steps of humans is crucial, not only to identify their limitations in different problem-solving settings, but also to design AI tools that are useful and well-integrated into these processes. The focus of this thesis is to examine how humans solve different types of problems with interactive AI systems. We use a mixed-methods approach to obtain qualitative insights about underlying cognitive processes and quantitative data about human behavior, performance, and confidence during problem-solving. These results provide insights to understand what is important in both well-defined and ill-defined problems. Furthermore, we can investigate what happens when appropriate AI systems are employed to potentially support humans during their problem-solving process. To examine such human-AI interaction, we conduct several empirical studies. In the first one, a human and AI agent have to collaboratively solve a well-defined problem. This means, they solve a task together, in which all steps and sub-tasks that need to be completed are known. In this study, the overall performance is influenced by the coordination of sub-tasks. This coordination entails who solves a particular sub-task and in which order all of them are completed. Thus, we examine how humans coordinate with an AI agent. To do this, we designed our experimental task to include sub-tasks that can either be solved by only the human, only the agent, or both. Some sub-tasks have interdependent steps as well. Therefore, the interaction and coordination have a substantial influence on how efficient and well the human-AI team (HAT) performs. In such settings, the aspect of AI autonomy is crucial: Determining who handles each sub-task and how they are solved efficiently depends on how interactions and communication are initiated and carried out between humans and AI agents. Thus, we empirically investigate the impact of AI autonomy on HAT performance and user satisfaction in a cooperative task in a simulated shared workspace. Specifically, we compare fixed AI autonomy levels with situation-dependent autonomy adaptation. We find that HATs performed best when the AI adjusted its autonomy based on the current situation. Users also rated this agent highest in terms of perceived intelligence. Our findings highlight the benefits of adaptive AI autonomy in settings where humans solve such a well-defined problem together with an AI agent. Furthermore, we explore how humans solve an example task for ill-defined problems. Specifically, we investigate guesstimation, i.e., the estimation of unknown quantities from incomplete or highly uncertain information. Guesstimation problems are ill-defined since multiple approaches are possible, and often it is not even clear how to evaluate the quality of solutions. If it is not possible to determine the quality of the solution in experiments, however, it becomes very hard to investigate the performance in such tasks. To address this, we devised guesstimation problems across a wide range of domains to which we know the answers, but participants in our study could not know or find out directly. Using these questions allowed us to analyze the problem-solving process systematically with a mixed-methods approach. We examined our participants’ underlying solution processes with qualitative data by collecting think-aloud protocols during guesstimation. With such rich data, we were able to identify their solution strategies and how they approach these problems. In addition, we collected quantitative measures for their performance and confidence about their answers. We found that participants solved guesstimation problems reasonably well. They decomposed the questions into sub-questions and often transformed them into semantically related ones that were easier to answer. However, this is also where impasses frequently occurred: often they were unable to brainstorm semantic transformations and got stuck, leading them to simply guess an answer. To address this impasse, we provided another AI system. We prompted a Large Language Model (LLM), such that it was able to provide ideas for transformations during this brainstorming process within guesstimation. We then tested the impact of such an AI tool’s availability on task performance. Thus, we not only identified guesstimation as a promising testbed for studying human-AI interaction in ill-defined problem-solving settings, but also provide in-depth evaluations. While the tool successfully produced human-like suggestions, participants were reluctant to use it. Because of this, we found no significant difference in the participants’ performance based on the tool’s availability. Given our results, we reflect on why LLMs are not (yet) capable to significantly increase performance in these kinds of tasks. We discuss why the design of AI tools for such cognitive support is not trivial, but also point to promising directions for future work. We also observed that the LLM we used as a brainstorming tool sometimes generated outputs containing harmful biases, for instance, when the guesstimation questions included references to certain regions of the world. To ensure that AI systems are human-centered, we need to not only integrate them well into the cognitive processes of problem-solvers, but also make them fair and prevent them from causing harm. This will be especially critical if such tools are used for guesstimation tasks in the real world, like (geo-)political forecasting. We therefore investigate biases in LLMs systematically. For this study, we focus on whether different state-of-the-art LLMs show biases in terms of gender and religion. Our findings show that (intersectional) biases are indeed present in all LLMs we tested – even despite many debiasing efforts. The LLMs are still significantly more likely to produce outputs that are in line with harmful stereotypes against marginalized groups. Therefore, we discuss what it would mean to employ these systems in real-world problem-solving settings, and what measures could be used to uncover and ultimately improve the unfair outputs of LLMs. In summary, this thesis deals with the investigation of human problem-solving with interactive AI systems. We show that different problem types, i.e., well-defined and ill-defined ones, require different considerations in terms of AI support and the interaction with such systems to ensure a human-centered approach. We empirically test what humans need and prefer, as well as how they coordinate with agents while they solve a well-defined problem. We also explore ill-defined problem-solving with AI in the case of guesstimation. We examine how humans approach and solve guesstimation problems, which informed how we apply AI support to be most promising. This approach takes into account both the needs of the human and the capabilities of current AI systems, such as LLMs. Thus, we not only identify guesstimation as a suitable case for potential complementarity by combining the strengths of humans and AI systems, but also investigate it in-depth. Generally, in both our well-defined and ill-defined problem-solving settings, we observe advantages and shortcomings of the human-AI interaction. We discuss the factors influencing the task performance and interaction in each setting, and which future directions are promising. We present how our findings and perspective of combining cognitive science and interaction research can further improve upon our understanding and, ultimately, the design of fair and beneficial human-AI interaction for problem-solving. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-289081 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 100 Philosophie und Psychologie > 150 Psychologie | ||||
Fachbereich(e)/-gebiet(e): | 03 Fachbereich Humanwissenschaften 03 Fachbereich Humanwissenschaften > Institut für Psychologie 03 Fachbereich Humanwissenschaften > Institut für Psychologie > Modelle höherer Kognition |
||||
Hinterlegungsdatum: | 20 Dez 2024 13:12 | ||||
Letzte Änderung: | 21 Dez 2024 15:39 | ||||
PPN: | |||||
Referenten: | Jäkel, Prof. Dr. Frank ; Chuang, Prof. Dr. Lewis | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 9 Dezember 2024 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |