Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ practical computational complexity of <cmath> SQRT()

What is the difference in CPU cycles (or, in essence, in 'speed') between

 x /= y;

and

 #include <cmath>
 x = sqrt(y);

EDIT: I know the operations aren't equivalent, I'm just arbitrarily proposing x /= y as a benchmark for x = sqrt(y)

like image 942
Matt Munson Avatar asked Jul 30 '11 16:07

Matt Munson


People also ask

What is the time complexity of sqrt function?

Time Complexity: O(√X). Only one traversal of the solution is needed, so the time complexity is O(√X).

Does cmath have sqrt?

The sqrt() function in C++ returns the square root of a number. This function is defined in the cmath header file. Mathematically, sqrt(x) = √x .

Is math sqrt constant time?

sqrt is O(1). It loops 52 times: that's not Log(N), that's constant time, much less efficient than using the floating point opcode, but still constant time.

What is Big O of root n?

As a result its time complexity is O(sqrt(n)) = O(sqrt(2^s)) = O(2^(s/2)) , where s is the size of the input, which is exponential.


1 Answers

The answer to your question depends on your target platform. Assuming you are using most common x86 cpus, I can give you this link http://instlatx64.atw.hu/ This is a collection of measured instruction latency (How long will it take to CPU to get result after it has argument) and how they are pipelined for many x86 and x86_64 processors. If your target is not x86, you can try to measure cost yourself or consult with your CPU documentation.

Firstly you should get a disassembler of your operations (from compiler e.g. gcc: gcc file.c -O3 -S -o file.asm or via dissasembly of compiled binary, e.g. with help of debugger). Remember, that In your operation there is loading and storing a value, which must be counted additionally.

Here are two examples from friweb.hu:

For Core 2 Duo E6700 latency (L) of SQRT (both x87, SSE and SSE2 versions)

  • 29 ticks for 32-bit float; 58 ticks for 64-bit double; 69 ticks for 80-bit long double;

of DIVIDE (of floating point numbers):

  • 18 ticks for 32-bit; 32 ticks for 64-bit; 38 ticks for 80-bit

For newer processors, the cost is less and is almost the same for DIV and for SQRT, e.g. for Sandy Bridge Intel CPU:

Floating-point SQRT is

  • 14 ticks for 32 bit; 21 ticks for 64 bit; 24 ticks for 80 bit

Floating-point DIVIDE is

  • 14 ticks for 32 bit; 22 ticks for 64 bit; 24 ticks for 80 bit

SQRT even a tick faster for 32bit.

So: For older CPUs, sqrt is itself 30-50 % slower than fdiv; For newer CPU the cost is the same. For newer CPU, cost of both operations become lower that it was for older CPUs; For longer floating format you needs more time; e.g. for 64-bit you need 2x time than for 32bit; but 80-bit is cheapy compared with 64-bit.

Also, newer CPUs have vector operations (SSE, SSE2, AVX) of the same speed as scalar (x87). Vectors are of 2-4 same-typed data. If you can align your loop to work on several FP values with same operation, you will get more performance from CPU.

like image 150
osgx Avatar answered Oct 26 '22 12:10

osgx