Performance comparison of FPU with software emulation

Tags:

While I know (so I have been told) that Floating-point coprocessors work faster than any software implementation of floating-point arithmetic, I totally lack the gut feeling how large this difference is, in order of magnitudes.

The answer probably depends on the application and where you work, between microprocessors and supercomputers. I am particularly interested in computer simulations.

Can you point out articles or papers for this question?

547

asked Mar 02 '13 11:03

shuhalo

1 Answers

A general answer will obviously very vague, because performance depends on so many factors.

However, based on my understanding, in processors that do not implement floating point (FP) operations in hardware, a software implementation will typically be 10 to 100 times slower (or even worse, if the implementation is bad) than integer operations, which are always implemented in hardware on CPUs.

The exact performance will depend on a number of factors, such as the features of the integer hardware - some CPUs lack a FPU, but have features in their integer arithmetic that help implement a fast software emulation of FP calculations.

The paper mentioned by njuffa, Cristina Iordache and Ping Tak Peter Tang, An Overview of Floating-Point Support and Math Library on the Intel XScale Architecture supports this. For the Intel XScale processor the list as latencies (excerpt):

Click to copy

integer addition or subtraction:  1 cycle
integer multiplication:           2-6 cycles
fp addition (emulated):           34 cycles
fp multiplication (emulated):     35 cycles

So this would result in a factor of about 10-30 between integer and FP arithmetic. The paper also mentions that the GNU implementation (the one the GNU compiler uses by default) is about 10 times slower, which is a total factor of 100-300.

Finally, note that the above is for the case where the FP emulation is compiled into the program by the compiler. Some operating systems (e.g. Linux and WindowsCE) also have an FP emulation in the OS kernel. The advantage is that even code compiled without FP emulation (i.e. using FPU instructions) can run on a process without an FPU - the kernel will transparently emulate unsupported FPU instructions in software. However, this emulation is even slower (about another factor 10) than a software emulation compiled into the program, because of additional overhead. Obviously, this case is only relevant on processor architectures where some processors haven an FPU, and some do not (such as x86 and ARM).

Note: This answer compares the performance of (emulated) FP operations with integer operations on the same processor. Your question might also be read to be about the performance of (emulated) FP operations compared to hardware FP operations (not sure what you meant). However, the result would be about the same, because if FP is implemented in hardware, it is typically (almost) as fast as integer operations.

answered Sep 20 '22 08:09

sleske

Related questions
                            
                                MYSQL slow queries in "slow queries log" - but same queries runs very fast manually
                            
                                Detect SwiftShader WebGL renderer in Chrome 18
                            
                                Maven clean install is equal mvn clean and after mvn install?
                            
                                which one of == and =:= should I use?
                            
                                optimizing simple Common Lisp gibbs sampler program
                            
                                Choosing efficient selectors based on computational complexity
                            
                                Using ElastiCache with RDS for improving read/write performance
                            
                                Python logging extremely slow on Linux server... but fast on Linux development VM?
                            
                                Strange performance observed with memoized function
                            
                                Fastest Java HashSet<Integer> library [closed]
                            
                                How to measure duration of different tasks in a data flow task?
                            
                                Fastest way to calculate minimum euclidean distance between two matrices containing high dimensional vectors
                            
                                System.Data.SQLite slow connect for non-admin users
                            
                                perfomance of .ashx handlers for retrieving a lot of binary images
                            
                                How do you write to disk (with flushing) in Java and maintain performance?
                            
                                Computing x^y with GCC vector intrinsics
                            
                                How to automatically kill slow MongoDB queries?
                            
                                Algorithm speed-up using List<T>.Sort and IEnumerable
                            
                                Is PHP execution any faster with strict standards?
                            
                                Faster way to concurrently insert data into MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance comparison of FPU with software emulation

Tags:

performance

floating-point

scientific-computing

fpu

shuhalo

People also ask

1 Answers

sleske

Recent Activity

Donate For Us