Summary of Silvia Veronese benchmarks

Quick overview

This document summarizes the benchmarks from Silvia Veronese's work in Hans Othmer's research group in computational mathematical biology at the University of Utah Mathematics Department.

The benchmark results are available in two series of tables.

The longer first table contains the results obtained for all optimization levels on all machines.
The shorter second (executive summary) table contains the best single result obtained for each machine. This is probably the set of results that you want to look at.

Each table contains rows of results ordered in decreasing performance. Each row contains the

benchmark name,
vendor and model,
compiler optimization flags (possibly truncated, if they are very long),
benchmark time in seconds (on UNIX systems, this is the sum of the user and system times reported by the time command; this will not exceed the actual wall-clock time), and
relative performance (1.0000 is fastest).

Important disclaimer

Please remember that there is no answer to the commonly-asked question: ``What is the fastest machine?''.

Within the collection of benchmarks of which these are members, it is often possible to pick a single benchmark which rates a particular machine the fastest, and yet, on other benchmarks, the same machine may perform poorly with respect to competing models.

Particularly on modern RISC architectures, performance can be extremely sensitive to the quality of compiler optimizations; in at least one case, a speedup of a factor of fifty was seen over a range of compiler options on the same system.

The benchmarking of these programs has investigated a substantial number of compilation options and optimization levels, but it is possible that new releases of compilers, or alternative compilers, might improve the results significantly. We make reasonable efforts to keep our compiler and operating systems up-to-date with vendor software releases, but particularly with older machine models, or machines obtained on a short-term loan for evaluation purposes, it is frequently impossible to rerun the benchmarks after such new releases.

It is imperative with computer benchmarking to examine a range of benchmark programs, where those programs are chosen to represent the kinds of numerical computation that are important to you, before coming to a conclusion about which machine is best for your jobs.

Many other factors besides benchmark performance should affect computer purchasing decisions, including at least these:

vendor track records;
vendor future directions, and survivability in an increasingly competitive market;
hardware and software reliability;
ease of administration;
ease of maintenance;
ease of use;
initial cost;
ongoing cost-of-ownership, including license renewals and cost of repairs;
upgrade costs, particularly for memory and disk storage; and
availability of third-party commercial software that you expect to require.

Brief benchmark description

All of the benchmark programs described in this document are written in highly-portable Fortran 77, and all represent real research programs using real data; they are not loop kernels or toy implementations. Program code sizes are given below.

`neu_grid`

[5339 lines (adi3d ) + 4770 lines (nksol), for a total of 10109 lines of Fortran code]

This is a PDE solver for non-linear parabolic and elliptic partial differential equations in three space dimensions. The algorithm is based on an ADI (alternating direction implicit) scheme (directional splitting). The code calls nksol for each of the three directions. The other two may be optionally solved with a direct solver (tridiagonal).

This particular case solves a wave propagation problem of one parabolic non-linear PDE.

This code requires the libnksol.a library in the parent directory. For benchmarking, it should be built once-and-for-all by (cd nksol; make OPT='...'), using a high optimization level before doing the benchmark run. The nksol code itself usually does not account for much of the run time of this program, so there is no need to rebuild it for each optimization level tested.

The code has explicit declarations of all variables (verified by compilation with the -u switch on several UNIX Fortran compilers). It has been additionally processed by ftnchek to find instances of mixed single/double precision arithmetic, and all such problems have been repaired.

Only the neu_grid program is used here for benchmarking purposes. Other programs that can be built by this Makefile may require libraries that are available only on the author's system.

The program size is LARGE: with nx = 75 in params.inc, it requires about 176MB on an HP 9000/735 system:

% size neu_grid
text data bss dec hex filename
261444 30207 175793620 176085271 a7ed917 neu_grid

Because the running time and memory goes approximately like nx**3, we have therefore reduced nx to 50, for which

% size neu_grid
text data bss  dec  hex filename
124559 5278 52162160 52291997 31de99d neu_grid

The input file for the neu_grid program, input.dat, can be adjusted in lines 5 and 6 to alter the run time. Starting time is always t = 0. Line 5 (default 0.1) is the time step size, dt. Line 6 is the ending time (default 0.4, that is, 4 time steps of 0.1 each). At the end of each time step, t <- t + dt, and the loop terminates if t >= tend. With the highest optimization level, this program does about one time step every 103 sec on an HP 9000/735 with the default nx = 50.