TU Darmstadt / ULB / TUbiblio

Evaluating the Evaluations of Code Recommender Systems: A Reality Check

Proksch, Sebastian ; Amann, Sven ; Nadi, Sarah ; Mezini, Mira (2016)
Evaluating the Evaluations of Code Recommender Systems: A Reality Check.
In: International Conference on Automated Software Engineering
doi: 10.1145/2970276.2970330
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet completion, or code search, an accurate evaluation of such systems is always a challenge. We analyzed the current literature and found that most of the current evaluations rely on artificial queries extracted from released code, which begs the question: Do such evaluations reflect real-life usages? To answer this question, we capture 6,189 fine-grained development histories from real IDE interactions. We use them as a ground truth and extract 7,157 real queries for a specific method-call recommender system. We compare the results of such real queries with different artificial evaluation strategies and check several assumptions that are repeatedly used in research, but never empirically evaluated. We find that an evolving context that is often observed in practice has a major effect on the prediction quality of recommender systems, but is not commonly reflected in artificial evaluations.

Typ des Eintrags: Artikel
Erschienen: 2016
Autor(en): Proksch, Sebastian ; Amann, Sven ; Nadi, Sarah ; Mezini, Mira
Art des Eintrags: Bibliographie
Titel: Evaluating the Evaluations of Code Recommender Systems: A Reality Check
Sprache: Englisch
Publikationsjahr: 25 August 2016
Verlag: IEEE/ACM
Titel der Zeitschrift, Zeitung oder Schriftenreihe: International Conference on Automated Software Engineering
DOI: 10.1145/2970276.2970330
Kurzbeschreibung (Abstract):

While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet completion, or code search, an accurate evaluation of such systems is always a challenge. We analyzed the current literature and found that most of the current evaluations rely on artificial queries extracted from released code, which begs the question: Do such evaluations reflect real-life usages? To answer this question, we capture 6,189 fine-grained development histories from real IDE interactions. We use them as a ground truth and extract 7,157 real queries for a specific method-call recommender system. We compare the results of such real queries with different artificial evaluation strategies and check several assumptions that are repeatedly used in research, but never empirically evaluated. We find that an evolving context that is often observed in practice has a major effect on the prediction quality of recommender systems, but is not commonly reflected in artificial evaluations.

Freie Schlagworte: General and reference: Evaluation Information systems: Recommender systems Software and its engineering: Software notations and tools Human-centered computing: Design and evaluation methods Empirical Study Artificial Evaluation IDE Interaction Data
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Softwaretechnik
Hinterlegungsdatum: 28 Sep 2016 09:08
Letzte Änderung: 28 Sep 2016 09:08
PPN:
Sponsoren: German Federal Ministry of Education and Research (BMBF) with grant no. 01IS12054, German Science Foundation (DFG) in the context of the CROSSING Collaborative Research Center (SFB #1119, project E1)
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen