Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ARM vs x86 for floating point

I apologize if I'm asking something very obvious.

Assume you are designing a piece of software which is heavy in floating point computation and you get to buy your own hardware. Assume that you rule out FPGAs and GPUs for reasons of flexibility and ease of code maintenance.

Assume further you have a decent level of parallelism in the software.

For a long time, that meant you were stuck with x86.

I am looking for an objective benchmark that would tell whether modern ARM CPUs are in the same ballpark. Maybe I'm searching wrong, but I find it very difficult to locate a trustworthy benchmark (something like LAPACK or maybe some physical simulation). I understand performance is obviously task dependent and that compiler optimizations will probably currently be better of x86, but at this stage I'm really looking to compare orders of magnitude.

Also, I find it strange that you can't really buy something along the lines of a raspberry PI, but with 8-64 modern cores comparable to the newest smartphones (like the newest Snapdragons) connected to a single bus. Do correct me if I'm mistaken, but such solutions may one day overtake GPUs in the FLOPS/$ category in addition to being more flexible.

like image 677
ziutek Avatar asked Feb 12 '23 16:02

ziutek


1 Answers

Below are my Linpack Benchmark results for PCs via Linux, Raspberry Pi and Android devices (I have lots more via Windows). These are based on my C/C++ 1996 conversion for PCs that was approved by Jack Dongarra, the original author, and obtainable via.

http://www.netlib.no/netlib/benchmark/linpack-pc.c

This is for a matrix of order 100, in double precision. Results below include some at single precision. Dongarra’s historic results for this and supercomputer varieties are in:

http://netlib.org/benchmark/performance.pdf

This is just one benchmark and others give a different story. You can obtain lots more from my site including source codes and MP varieties, (Free with no ads):

http://www.roylongbottom.org.uk/

Linux 32/64 Bit Results

Double Precision 100x100 compiled at 32 and 64 bits 

                                   Opt    No opt
CPU                      MHz    MFLOPS    MFLOPS

Atom N455     32b  Ub   1666       196        94
Atom N455     64b  Ub   1666       226        89

Core 2 Mob    32b  Ub   1830       983       307

Athlon 64     32b  Ub   2211       936       231
Athlon 64     64b  Ub   2211      1118       221

Core 2 Duo    32b  Ub   2400      1288       404
Core 2 Duo    64b  Ub   2400      1577       378

Phenom II     32b  Ub   3000      1464       411
Phenom II     64b  Ub   3000      1887       411
Phenom II     64b  Fe   3000      1872       407

Core i7 930   64b  Ub   ****      2265       511

Core i7 4820K 32b  Ub   $$$1      2534       988
Core i7 4820K 64b  Ub   $$$1      3672       900
Core i7 4820K AVX  Ub   $$$12     5413       935

  Ub = Ubuntu Linux,   Fe = Fedora Linux        
 ****  Rated as 2800 MHz but running at up to   
       3066 MHz using Turbo Boost               
 $$$1  Rated as 3700 MHz but running at up to   
       3900 MHz, using Turbo Boost              
 $$$12 As $$$1, but compiled with GCC 4.8.2 that
       produces AVX SIMD insructions.               

######################################################

      Android and Raspberry Pi Versions

Double Precision and Single Precision (SP) 100x100

                               v7/v5       v5 
CPU          MHz   Android    MFLOPS    MFLOPS

ARM 926EJ    800       2.2       5.7       5.6
ARM v7-A8    800     2.3.5      80.2          
ARM v7-A9    800     2.3.4     101.4      10.6
ARM v7-A9   1300a    4.1.2     151.1      17.1
ARM v7-A9   1500     4.0.3     171.4          
ARM v7-A9   1500a    4.0.3     155.5      16.9
ARM v7-A9   1400     4.0.4     184.4      19.9
ARM v7-A9   1600     4.0.3     196.5          
ARM v7-A15  2000b    4.2.2     459.2      28.8

                               v7 SP     Java 
CPU          MHz   Android    MFLOPS    MFLOPS

ARM 926EJ    800       2.2       9.6       2.3
ARM v7-A9    800     2.3.4     129.1      33.3
ARM v7-A9   1300a    4.1.2     201.3      56.4
ARM v7-A9   1500a    4.0.3     204.6      56.9
ARM v7-A9   1400     4.0.4     235.5      57.0
ARM v7-A15  2000b    4.2.2     803.0     143.1


Atom   Ax86 1666     2.2.1                15.7
Core 2 Ax86 2400     2.2.1                53.3

Raspberry Pi                    DP        SP  
CPU          MHz     Linux    MFLOPS    MFLOPS

ARM  1176    700     3.6.11     42        58  
ARM  1176   1000     3.6.11     68        88  

                              NEON SP         
CPU          MHz   Android    MFLOPS          

ARM v7-A9    800     2.3.4     255.8          
ARM v7-A9   1300a    4.1.2     376.0          
ARM v7-A9   1500a    4.0.3     382.5          
ARM v7-A9   1400     4.0.4     454.2          
ARM v7-A15  2000b    4.2.2    1334.9        
like image 172
Roy Longbottom Avatar answered Mar 16 '23 00:03

Roy Longbottom