I apologize if I'm asking something very obvious.
Assume you are designing a piece of software which is heavy in floating point computation and you get to buy your own hardware. Assume that you rule out FPGAs and GPUs for reasons of flexibility and ease of code maintenance.
Assume further you have a decent level of parallelism in the software.
For a long time, that meant you were stuck with x86.
I am looking for an objective benchmark that would tell whether modern ARM CPUs are in the same ballpark. Maybe I'm searching wrong, but I find it very difficult to locate a trustworthy benchmark (something like LAPACK or maybe some physical simulation). I understand performance is obviously task dependent and that compiler optimizations will probably currently be better of x86, but at this stage I'm really looking to compare orders of magnitude.
Also, I find it strange that you can't really buy something along the lines of a raspberry PI, but with 8-64 modern cores comparable to the newest smartphones (like the newest Snapdragons) connected to a single bus. Do correct me if I'm mistaken, but such solutions may one day overtake GPUs in the FLOPS/$ category in addition to being more flexible.
Below are my Linpack Benchmark results for PCs via Linux, Raspberry Pi and Android devices (I have lots more via Windows). These are based on my C/C++ 1996 conversion for PCs that was approved by Jack Dongarra, the original author, and obtainable via.
http://www.netlib.no/netlib/benchmark/linpack-pc.c
This is for a matrix of order 100, in double precision. Results below include some at single precision. Dongarra’s historic results for this and supercomputer varieties are in:
http://netlib.org/benchmark/performance.pdf
This is just one benchmark and others give a different story. You can obtain lots more from my site including source codes and MP varieties, (Free with no ads):
http://www.roylongbottom.org.uk/
Linux 32/64 Bit Results
Double Precision 100x100 compiled at 32 and 64 bits
Opt No opt
CPU MHz MFLOPS MFLOPS
Atom N455 32b Ub 1666 196 94
Atom N455 64b Ub 1666 226 89
Core 2 Mob 32b Ub 1830 983 307
Athlon 64 32b Ub 2211 936 231
Athlon 64 64b Ub 2211 1118 221
Core 2 Duo 32b Ub 2400 1288 404
Core 2 Duo 64b Ub 2400 1577 378
Phenom II 32b Ub 3000 1464 411
Phenom II 64b Ub 3000 1887 411
Phenom II 64b Fe 3000 1872 407
Core i7 930 64b Ub **** 2265 511
Core i7 4820K 32b Ub $$$1 2534 988
Core i7 4820K 64b Ub $$$1 3672 900
Core i7 4820K AVX Ub $$$12 5413 935
Ub = Ubuntu Linux, Fe = Fedora Linux
**** Rated as 2800 MHz but running at up to
3066 MHz using Turbo Boost
$$$1 Rated as 3700 MHz but running at up to
3900 MHz, using Turbo Boost
$$$12 As $$$1, but compiled with GCC 4.8.2 that
produces AVX SIMD insructions.
######################################################
Android and Raspberry Pi Versions
Double Precision and Single Precision (SP) 100x100
v7/v5 v5
CPU MHz Android MFLOPS MFLOPS
ARM 926EJ 800 2.2 5.7 5.6
ARM v7-A8 800 2.3.5 80.2
ARM v7-A9 800 2.3.4 101.4 10.6
ARM v7-A9 1300a 4.1.2 151.1 17.1
ARM v7-A9 1500 4.0.3 171.4
ARM v7-A9 1500a 4.0.3 155.5 16.9
ARM v7-A9 1400 4.0.4 184.4 19.9
ARM v7-A9 1600 4.0.3 196.5
ARM v7-A15 2000b 4.2.2 459.2 28.8
v7 SP Java
CPU MHz Android MFLOPS MFLOPS
ARM 926EJ 800 2.2 9.6 2.3
ARM v7-A9 800 2.3.4 129.1 33.3
ARM v7-A9 1300a 4.1.2 201.3 56.4
ARM v7-A9 1500a 4.0.3 204.6 56.9
ARM v7-A9 1400 4.0.4 235.5 57.0
ARM v7-A15 2000b 4.2.2 803.0 143.1
Atom Ax86 1666 2.2.1 15.7
Core 2 Ax86 2400 2.2.1 53.3
Raspberry Pi DP SP
CPU MHz Linux MFLOPS MFLOPS
ARM 1176 700 3.6.11 42 58
ARM 1176 1000 3.6.11 68 88
NEON SP
CPU MHz Android MFLOPS
ARM v7-A9 800 2.3.4 255.8
ARM v7-A9 1300a 4.1.2 376.0
ARM v7-A9 1500a 4.0.3 382.5
ARM v7-A9 1400 4.0.4 454.2
ARM v7-A15 2000b 4.2.2 1334.9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With