We are pleased to annonce that we’ll organize a Bof session
Pros and Cons of HPCx benchmarks
during SC18. Schedule: Nov 13 (Tue) 12:15-13:15 D174
Erich Strohmaier
Walter Lioen (slides)
Kei Hiraki (slides)
Jun Makino (slides)
(Jose Gracia)
(John Shalf)
Abstract
HPL has been and still is the single most widely used benchmark for the HPC systems, even though there have been many criticisms. The most important criticism is that HPL measures only the peak floating point performance and its result has little correlation with real application performance. HPCG (and also HPGMG) have been proposed as either alternative or complimentary benchmarks. HPCG measures mainly the main memory bandwidth. In this BoF, we want to exchange opinions on what aspects of machines should be measured and how by these benchmarks, in particular when used as the requirement for new machines.
Detailed proposal:
The HPL benchmark has been the most widely accepted measure for HPC systems, at least for the last two decades. On the other hand, there have been long-standing criticisms that HPL measures only one aspect of the performance of HPC systems — the peak floating-point performance. Since the main part of computation in HPL can be transformed to multiplications of dense matrices, HPL performance number reflects the efficiency of the DGEMM implementation and absolute peak performance, and not much else.
In practice, however, it is not that simple to achieve high efficiency on HPL, since other part of calculation, such as the pivot search and the row exchange, can dominate the calculation time when the size of the matrix is small. On the other hand, this means that if the size of the main memory is large enough, one can always achieve high efficiency on HPL.
HPCG was proposed as a possible alternative to HPL. As its name suggest, HPCG measures the performance of HPC systems for iterations of the Conjugate Gradient method to solve large sparse matrix. Thus, in HPCG, the most timeconsuming part of the calculation is multiplication of a sparse matrix and a vector using indirect access, without much room for further optimization.
This choice of HPCG to prohibit certain optimizations is quite different from the regulation of HPL, in which the implementers are allowed to implement whatever optimizations, except the ones which reduces the total number of floating point operations executed.
As a result, HPCG measures essentially one single number: The main memory bandwidth for contiguous memory access. Of course, if the main memory is vert small, we’ll see the effect of the network performance.
It is obvious that neither HPL nor HPCG is sufficient to describe the
performance characteristics of HPC systems. Thus the natural question
is what set of benchmarks should be used to measure what aspect of
machines. Of course, if the machine will be used for limited number of
applications, in principle we can use the applications themselves to
evaluate the hardware. However, it is not always possible to run full
applications on new machines, in particular of they do not exist.
This BoF will discuss what factors determine the performance of real-world
applications, and how we can design benchmarks which can be used to measure
these factors.
comment closed