Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is integer multiplication really done at the same speed as addition on a modern CPU?

I hear this statement quite often, that multiplication on modern hardware is so optimized that it actually is at the same speed as addition. Is that true?

I never can get any authoritative confirmation. My own research only adds questions. The speed tests usually show data that confuses me. Here is an example:

#include <stdio.h> #include <sys/time.h>  unsigned int time1000() {     timeval val;     gettimeofday(&val, 0);     val.tv_sec &= 0xffff;     return val.tv_sec * 1000 + val.tv_usec / 1000; }  int main() {     unsigned int sum = 1, T = time1000();     for (int i = 1; i < 100000000; i++) {         sum += i + (i+1); sum++;     }     printf("%u %u\n", time1000() - T, sum);     sum = 1;     T = time1000();     for (int i = 1; i < 100000000; i++) {         sum += i * (i+1); sum++;     }     printf("%u %u\n", time1000() - T, sum); } 

The code above can show that multiplication is faster:

clang++ benchmark.cpp -o benchmark ./benchmark 746 1974919423 708 3830355456 

But with other compilers, other compiler arguments, differently written inner loops, the results can vary and I cannot even get an approximation.

like image 720
exebook Avatar asked Feb 17 '14 02:02

exebook


People also ask

Is addition faster than multiplication on CPU?

An integer multiplication always needs a "carry propagate add" step at the end. Consequently, addition is always faster because that's the final step of a multiplication.

Is multiplication slower than addition?

Yes, according to http://www.agner.org/optimize/instruction_tables.pdf, add instruction is 3 times faster than mult instruction.

Is multiplication easier than addition?

When we're doing arithmetic, addition is easier than multiplication, so we change multiplication problems into addition problems using logarithms.

What is the fastest integer multiplication algorithm?

For more than 35 years, the fastest known method for integer multiplication has been the Schönhage-Strassen algorithm running in time O(nlog nlog log n).


2 Answers

Multiplication of two n-bit numbers can in fact be done in O(log n) circuit depth, just like addition.

Addition in O(log n) is done by splitting the number in half and (recursively) adding the two parts in parallel, where the upper half is solved for both the "0-carry" and "1-carry" case. Once the lower half is added, the carry is examined, and its value is used to choose between the 0-carry and 1-carry case.

Multiplication in O(log n) depth is also done through parallelization, where every sum of 3 numbers is reduced to a sum of just 2 numbers in parallel, and the sums are done in some manner like the above.
I won't explain it here, but you can find reading material on fast addition and multiplication by looking up "carry-lookahead" and "carry-save" addition.

So from a theoretical standpoint, since circuits are obviously inherently parallel (unlike software), the only reason multiplication would be asymptotically slower is the constant factor in the front, not the asymptotic complexity.

like image 199
user541686 Avatar answered Sep 21 '22 07:09

user541686


Integer multiplication will be slower.

Agner Fog's instruction tables show that when using 32-bit integer registers, Haswell's ADD/SUB take 0.25–1 cycles (depending on how well pipelined your instructions are) while MUL takes 2–4 cycles. Floating-point is the other way around: ADDSS/SUBSS take 1–3 cycles while MULSS takes 0.5–5 cycles.

like image 33
Cory Nelson Avatar answered Sep 21 '22 07:09

Cory Nelson