I recently wrote a Vector 3 class, and I submitted my normalize() function for reviewal to a friend. He said it was good, but that I should multiply by the reciprocal where possible because "multiplying is cheaper than dividing" in CPU time.
My question simply is, why is that?
Division and modulus are more than twice as expensive as multiplication (a weight 10). The division by two or a multiple of two is always a trick, but not much more can be done without having side-effects. If you can replace division by multiplication, you do get a speed-up of more than two.
Multiplication is faster than division.
On average multiplication requires fewer steps in the algorithm. For instance multiplying two three-digit numbers is quite easy but dividing even single digit numbers (e.g., 1/7) may require lots of iterations compared to input size.
Division is slower than multiplication because there is no direct, parallel method for calculating it : Either there is an iteration, or hardware is copied to implement the iteration as cascaded (or pipelined) blocks.
Think about it in terms of elementary operations that hardware can more easily implement -- add, subtract, shift, compare. Multiplication even in a trivial setup requires fewer such elementary steps -- plus, it afford advances algorithms that are even faster -- see here for example... but hardware generally doesn't take advantage of those (except maybe extremely specialized hardware). For example, as the wikipedia URL says, "Toom–Cook can do a size-N cubed multiplication for the cost of five size-N multiplications" -- that's pretty fast indeed for very large numbers (Fürer's algorithm, a pretty recent development, can do Θ(n ln(n) 2Θ(ln*(n)))
-- again, see the wikipedia page and links therefrom).
Division's just intrisically slower, as -- again -- per wikipedia; even the best algorithms (some of which ARE implemented in HW, just because they're nowhere as sophisticated and complex as the very best algorithms for multiplication;-) can't hold a candle to the multiplication ones.
Just to quantify the issue with not-so-huge numbers, here are some results with gmpy, an easy-to-use Python wrapper around GMP, which tends to have pretty good implementations of arithmetic though not necessarily the latest-and-greatest wheezes. On a slow (first-generation;-) Macbook Pro:
$ python -mtimeit -s'import gmpy as g; a=g.mpf(198792823083408); b=g.mpf(7230824083); ib=1.0/b' 'a*ib'
1000000 loops, best of 3: 0.186 usec per loop
$ python -mtimeit -s'import gmpy as g; a=g.mpf(198792823083408); b=g.mpf(7230824083); ib=1.0/b' 'a/b'
1000000 loops, best of 3: 0.276 usec per loop
As you see, even at this small size (number of bits in the numbers), and with libraries optimized by exactly the same speed-obsessed people, multiplication by the reciprocal can save 1/3 of the time that division takes.
It may be only in rare situations that these few nanoseconds are a life-or-death issue, but, when they are, and of course IF you are repeatedly dividing by the same value (to amortize away the 1.0/b
operation!), then this knowledge can be a life-saver.
(Much in the same vein -- x*x
will often save time compared to x**2
[in languages that have a **
"raise to power" operator, like Python and Fortran] -- and Horner's scheme for polynomial computation is VASTLY preferable to repeated raise-to-power operations!-).
If you think back to grade school, you'll recall that multiplication was harder than addition and division was harder than multiplication. Things aren't any different for the CPU.
Recall also that calculating the reciprocal involves a division, so unless you calculate the reciprocal once and use it three times, you won't see a speed up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With