TU Darmstadt / ULB / TUbiblio

Learning to Find Bugs in Programs and their Documentation

Habib, Andrew (2021):
Learning to Find Bugs in Programs and their Documentation. (Publisher's Version)
Darmstadt, Technische Universität,
DOI: 10.26083/tuprints-00017377,
[Ph.D. Thesis]

Abstract

Although software is pervasive, almost all programs suffer from bugs and errors. To detect software bugs, developers use various techniques such as static analysis, dynamic analysis, and model checking. However, none of these techniques is bulletproof.

This dissertation argues that learning from programs and their documentation provides an effective means to prevent and detect software bugs. The main observation that motivates our work is that software documentation is often under-utilized by traditional bug detection techniques. Leveraging the documentation together with the program itself, whether its source code or runtime behavior, enables us to build unconventional bug detectors that benefit from the richness of natural language documentation and the formal algorithm of a program. More concretely, we present techniques that utilize the documentation of a program and the program itself to: (i) Improve the documentation by inferring missing information and detecting inconsistencies, and (ii) Find bugs in the source code or runtime behavior of the program. A key insight we build on is that machine learning provides a powerful means to deal with the fuzziness and nuances of natural language in software documentation and that source code is repetitive enough to also allow statistical learning from it. Therefore, several of the techniques proposed in this dissertation employ a learning component whether from documentation, source code, runtime behavior, and their combinations.

We envision the impact of our work to be two-fold. First, we provide developers with novel bug detection techniques that complement traditional ones. Our approaches learn bug detectors end-to-end from data and hence, do not require complex analysis frameworks. Second, we hope that our work will open the door for more research on automatically utilizing natural language in software development. Future work should explore more ideas on how to extract richer information from natural language to automate software engineering tasks, and how to utilize the programs themselves to enhance the state-of-the-practice in software documentation.

Item Type: Ph.D. Thesis
Erschienen: 2021
Creators: Habib, Andrew
Status: Publisher's Version
Title: Learning to Find Bugs in Programs and their Documentation
Language: English
Abstract:

Although software is pervasive, almost all programs suffer from bugs and errors. To detect software bugs, developers use various techniques such as static analysis, dynamic analysis, and model checking. However, none of these techniques is bulletproof.

This dissertation argues that learning from programs and their documentation provides an effective means to prevent and detect software bugs. The main observation that motivates our work is that software documentation is often under-utilized by traditional bug detection techniques. Leveraging the documentation together with the program itself, whether its source code or runtime behavior, enables us to build unconventional bug detectors that benefit from the richness of natural language documentation and the formal algorithm of a program. More concretely, we present techniques that utilize the documentation of a program and the program itself to: (i) Improve the documentation by inferring missing information and detecting inconsistencies, and (ii) Find bugs in the source code or runtime behavior of the program. A key insight we build on is that machine learning provides a powerful means to deal with the fuzziness and nuances of natural language in software documentation and that source code is repetitive enough to also allow statistical learning from it. Therefore, several of the techniques proposed in this dissertation employ a learning component whether from documentation, source code, runtime behavior, and their combinations.

We envision the impact of our work to be two-fold. First, we provide developers with novel bug detection techniques that complement traditional ones. Our approaches learn bug detectors end-to-end from data and hence, do not require complex analysis frameworks. Second, we hope that our work will open the door for more research on automatically utilizing natural language in software development. Future work should explore more ideas on how to extract richer information from natural language to automate software engineering tasks, and how to utilize the programs themselves to enhance the state-of-the-practice in software documentation.

Place of Publication: Darmstadt
Collation: xiv, 211 Seiten
Divisions: 20 Department of Computer Science
20 Department of Computer Science > SOLA - Software Lab
Date Deposited: 08 Feb 2021 15:03
DOI: 10.26083/tuprints-00017377
Official URL: https://tuprints.ulb.tu-darmstadt.de/17377
URN: urn:nbn:de:tuda-tuprints-173778
Referees: Mezini, Prof. Dr. Mira ; Pradel, Prof. Dr. Michael ; T. Devanbu, Prof. Dr. Premkumar
Refereed / Verteidigung / mdl. Prüfung: 14 December 2020
Alternative Abstract:
Alternative abstract Language

Obwohl Software allgegenwärtig ist, leiden fast alle Programme unter Fehlern. Um Softwarefehler zu erkennen, verwenden Entwickler verschiedene Techniken wie statische Analyse, dynamische Analyse und Modellprüfung. Jedoch ist keine dieser Techniken perfekt.

In dieser Dissertation wird argumentiert, dass das Lernen aus Programmen und deren Dokumentation ein wirksames Mittel darstellt, um Softwarefehler zu erkennen und zu verhindern. Die wichtigste Beobachtung, welche diese Arbeit motiviert, ist, dass die Softwaredokumentation von herkömmlichen Fehlererkennungstechniken häufig nicht ausreichend genutzt wird. Durch die Nutzung der Dokumentation zusammen mit dem Programm selbst, unabhängig davon, ob es sich um den Quellcode oder das Laufzeitverhalten handelt, können unkonventionelle Fehlerdetektoren erstellt werden, welche von der Fülle der natürlichen Sprache in der Dokumentation und dem formalen Algorithmus eines Programms profitieren. Konkreter stellen wir Techniken vor, welche die Dokumentation eines Programms und das Programm selbst verwenden, um: (i) die Dokumentation zu verbessern, indem auf fehlende Informationen geschlossen und Inkonsistenzen festgestellt werden, und (ii) Fehler im Quellcode oder im Laufzeitverhalten des Programms zu finden. Eine wichtige Erkenntnis, auf welcher wir aufbauen, ist, dass maschinelles Lernen ein leistungsfähiges Mittel darstellt, um mit der Unschärfe und den Nuancen natürlicher Sprache in der Softwaredokumentation umzugehen, und dass sich der Quellcode oft genug wiederholt, um auch statistisches Lernen daraus zu ermöglichen. Daher verwenden einige der in dieser Dissertation vorgeschlagenen Techniken eine Lernkomponente, welche sich aus Dokumentation, Quellcode, Laufzeitverhalten und deren Kombinationen ergibt.

Wir sehen die Auswirkungen unserer Arbeit in zweifacher Hinsicht. Erstens bieten wir Entwicklern neuartige Fehlererkennungstechniken, welche herkömmliche ergänzen. Unsere Ansätze lernen Fehlerdetektoren durchgängig aus Daten und erfordern daher keine komplexen Analyserahmen. Zweitens hoffen wir, dass unsere Arbeit die Tür für weitere Forschungen zur automatischen Verwendung natürlicher Sprache in der Softwareentwicklung öffnet. Zukünftige Arbeiten sollten weitere Ideen untersuchen, wie umfangreichere Informationen aus der natürlichen Sprache extrahiert werden können, um Softwareentwicklungsaufgaben zu automatisieren, und wie die Programme selbst verwendet werden können, um die aktuelle Praxis in der Softwaredokumentation zu verbessern

German
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details