TU Darmstadt / ULB / TUbiblio

Python Code Generation by Asking Clarification Questions

Li, Haau-Sing ; Mesgar, Mohsen ; Martins, André F. T. ; Gurevych, Iryna (2023)
Python Code Generation by Asking Clarification Questions.
61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada (09.-14.07.2023)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Code generation from text requires understanding the user’s intent from a natural languagedescription and generating an executable code snippet that satisfies this intent. While recent pretrained language models demonstrate remarkable performance for this task, these models fail when the given natural language description is under-specified. In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. Therefore, we collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers. The empirical results of our evaluation of pretrained language model performance on code generation show that clarifications result in more precisely generated code, as shown by the substantial improvement of model performance in all evaluation metrics. Alongside this, our task and dataset introduce new challenges to the community, including when and what clarification questions should be asked. Our code and dataset are available on GitHub.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Autor(en): Li, Haau-Sing ; Mesgar, Mohsen ; Martins, André F. T. ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Python Code Generation by Asking Clarification Questions
Sprache: Englisch
Publikationsjahr: 24 Juli 2023
Verlag: ACL
Buchtitel: The 61st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference Volume 1: Long Papers
Veranstaltungstitel: 61st Annual Meeting of the Association for Computational Linguistics
Veranstaltungsort: Toronto, Canada
Veranstaltungsdatum: 09.-14.07.2023
URL / URN: https://aclanthology.org/2023.acl-long.799/
Kurzbeschreibung (Abstract):

Code generation from text requires understanding the user’s intent from a natural languagedescription and generating an executable code snippet that satisfies this intent. While recent pretrained language models demonstrate remarkable performance for this task, these models fail when the given natural language description is under-specified. In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. Therefore, we collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers. The empirical results of our evaluation of pretrained language model performance on code generation show that clarifications result in more precisely generated code, as shown by the substantial improvement of model performance in all evaluation metrics. Alongside this, our task and dataset introduce new challenges to the community, including when and what clarification questions should be asked. Our code and dataset are available on GitHub.

Freie Schlagworte: UKP_p_LOEWE_Spitzenprofessur
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 09 Aug 2023 09:34
Letzte Änderung: 11 Aug 2023 06:27
PPN: 510577113
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen