Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of different math functions in x86?

I am writing a 3D collision, and want to know the difference in performance of basic math functions like + - * / sqrt pwr trigonometry like sin cos tan arcsin..

I heard it depends on many other things so I just want to get a rough idea about which one is slower and need to avoid while finding different ways to solve the problem. Also I want to know the order and the magnitude of the difference

Thanks

Edit: I write in VC++ for x86. But knowledge in other architectures and general picture are good, too. Mainly I calculate in single floating point for real time application.

The problem is that some algorithms need sqrt, or trigonometry, but I can bypass them by other methods. Each one has its own advances and I want to know is enough to do trade off. I want a general knowledge to solve my own problem, did a google but found nothing so please let it be answered

like image 459
Le Minh Duc Avatar asked Dec 12 '12 19:12

Le Minh Duc


2 Answers

Speaking very broadly, and generalizing about recent common hardware:

  • addition, subtraction and multiplication are fast (capable of at least one operation per cycle per core).
  • division and square root are typically about an order of magnitude slower (tens of cycles per operation). There are many approximation algorithms that can be used to narrow this gap somewhat for specific usages.
  • calling math library functions (sin, cos, exp, log, etc) varies significantly depending on what math library implementation you are using and on what hardware. On (say) a current i7, something between an operation every ~20 cycles and an operation every ~200 cycles is typical, depending on the quality of the implementation and the specific function being called.
like image 120
Stephen Canon Avatar answered Oct 23 '22 19:10

Stephen Canon


For a rough idea: +, - < * < / < sqrt < sin, cos, etc

PS. On recent Intel architectures:

ADDSD/SUBSD - 3 cycles latency, 1 cycle throughput

MULSD - 6-7 cycles latency, 2 cycles throughput

DIVSD - 38-39 cycles latency, 38-39 cycles throughput

like image 2
chill Avatar answered Oct 23 '22 19:10

chill