Nikitenko, Dmitry A. ; Wolf, Felix ; Mohr, Bernd ; Hoefler, Torsten ; Stefanov, Konstantin S. ; Voevodin, Vadim Vladimirovich ; Antonov, Aleksandr Sergeevich ; Calotoiu, Alexandru (2021)
Influence of Noisy Environments on Behavior of HPC Applications.
In: Lobachevskii Journal of Mathematics, 42 (7)
doi: 10.1134/S1995080221070192
Artikel, Bibliographie
Kurzbeschreibung (Abstract)
Many contemporary HPC systems expose their jobs to substantial amounts of interference, leading to significant run-to-run variation. For example, application runtimes on Theta, a Cray XC40 system at Argonne National Laboratory, vary by up to 70, caused by a mix of node-level and system-level effects, including network and file-system congestion in the presence of concurrently running jobs. This makes performance measurements generally irreproducible, heavily complicating performance analysis and modeling. On noisy systems, performance analysts usually have to repeat performance measurements several times and then apply statistics to capture trends. First, this is expensive and, second, extracting trends from a limited series of experiments is far from trivial, as the noise can follow quite irregular patterns. Attempts to learn from performance data how a program would perform under different execution configurations experience serious perturbation, resulting in models that reflect noise rather than intrinsic application behavior. On the other hand, although noise heavily influences execution time and energy consumption, it does not change the computational effort a program performs. Effort metrics that count how many operations a machine executes on behalf of a program, such as floating-point operations, the exchange of MPI messages, or file reads and writes, remain largely unaffected and—rare non-determinism set aside—reproducible. This paper addresses initial stage of an ExtraNoise project, which is aimed at revealing and tackling key questions of system noise influence on HPC applications.
Typ des Eintrags: | Artikel |
---|---|
Erschienen: | 2021 |
Autor(en): | Nikitenko, Dmitry A. ; Wolf, Felix ; Mohr, Bernd ; Hoefler, Torsten ; Stefanov, Konstantin S. ; Voevodin, Vadim Vladimirovich ; Antonov, Aleksandr Sergeevich ; Calotoiu, Alexandru |
Art des Eintrags: | Bibliographie |
Titel: | Influence of Noisy Environments on Behavior of HPC Applications |
Sprache: | Englisch |
Publikationsjahr: | 9 August 2021 |
Verlag: | Springer |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | Lobachevskii Journal of Mathematics |
Jahrgang/Volume einer Zeitschrift: | 42 |
(Heft-)Nummer: | 7 |
DOI: | 10.1134/S1995080221070192 |
Kurzbeschreibung (Abstract): | Many contemporary HPC systems expose their jobs to substantial amounts of interference, leading to significant run-to-run variation. For example, application runtimes on Theta, a Cray XC40 system at Argonne National Laboratory, vary by up to 70, caused by a mix of node-level and system-level effects, including network and file-system congestion in the presence of concurrently running jobs. This makes performance measurements generally irreproducible, heavily complicating performance analysis and modeling. On noisy systems, performance analysts usually have to repeat performance measurements several times and then apply statistics to capture trends. First, this is expensive and, second, extracting trends from a limited series of experiments is far from trivial, as the noise can follow quite irregular patterns. Attempts to learn from performance data how a program would perform under different execution configurations experience serious perturbation, resulting in models that reflect noise rather than intrinsic application behavior. On the other hand, although noise heavily influences execution time and energy consumption, it does not change the computational effort a program performs. Effort metrics that count how many operations a machine executes on behalf of a program, such as floating-point operations, the exchange of MPI messages, or file reads and writes, remain largely unaffected and—rare non-determinism set aside—reproducible. This paper addresses initial stage of an ExtraNoise project, which is aimed at revealing and tackling key questions of system noise influence on HPC applications. |
Freie Schlagworte: | DFG|449683531, DFG, 449683531 |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Parallele Programmierung |
Hinterlegungsdatum: | 20 Mär 2024 13:33 |
Letzte Änderung: | 06 Jun 2024 10:52 |
PPN: | 518869598 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |