These benchmarks are based on Intel's Optimized LINPACK Benchmark as distributed with Intel's mkl libraries v.10.1.1.019. On a single node and using the defaults, the xlinpack_xeon64 executable gives:
The log file from the run is ...
Intel(R) LINPACK data
Current date/time: Sun Jan 25 03:08:37 2009
CPU frequency: 2.403 GHz
Number of CPUs: 4
Number of threads: 4
Parameters are set to:
Number of tests : 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used = 2593516256, at the size = 18000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
1000 1000 4 0.031 21.8866 9.419757e-13 3.212379e-02
1000 1000 4 0.030 22.5293 9.419757e-13 3.212379e-02
1000 1000 4 0.030 22.5124 9.419757e-13 3.212379e-02
1000 1000 4 0.030 22.5171 9.419757e-13 3.212379e-02
2000 2000 4 0.209 25.5189 4.657913e-12 4.051814e-02
2000 2000 4 0.210 25.3835 4.657913e-12 4.051814e-02
5000 5008 4 2.902 28.7323 2.350900e-11 3.278141e-02
5000 5008 4 2.900 28.7573 2.350900e-11 3.278141e-02
10000 10000 4 20.807 32.0502 8.841794e-11 3.117706e-02
10000 10000 4 20.816 32.0361 8.841794e-11 3.117706e-02
15000 15000 4 68.612 32.7998 2.083827e-10 3.282063e-02
15000 15000 4 68.614 32.7988 2.083827e-10 3.282063e-02
18000 18008 4 117.619 33.0613 2.989012e-10 3.273336e-02
18000 18008 4 117.617 33.0618 2.989012e-10 3.273336e-02
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 22.3613 22.5293
2000 2000 4 25.4512 25.5189
5000 5008 4 28.7448 28.7573
10000 10000 4 32.0432 32.0502
15000 15000 4 32.7993 32.7998
18000 18008 4 33.0616 33.0618
End of tests
With 33 Gflops per node, and for embarrassingly parallel problems, the cluster would deliver 0.3 Tflops. To get a feeling for this number, top500's November 2008 list reported its slowest entry to be at 22.1 Tflops (and this was not for an embarrassingly parallel problem ).
Since top500 was mentioned, this is possibly a proper place for a little cross-validated propaganda. Here it comes: the representation of the various operating system families in November's 2008 list is:
| Count | Share (%) |
Linux | 439 | 87.8 % |
Windoze | 5 | 1.0 % |
Unix | 23 | 4.6 % |
BSD Based | 1 | 0.2 % |
Mixed | 31 | 6.2 % |
Mac OS | 1 | 0.2 % |
Totals | 500 | 100.0 % |