In the past few days I've been working on improving the runtime of a python function which requires many uses of the remainder function (%) among other things. My main test case is over an 80,000 element numpy array (monotonically increasing), with 10000 iterations, although I've tried on various other sizes as well.
Eventually I reached a point where the remainder function is a major bottleneck, and tried various solutions. This is the behaviour I found when running the following code:
import numpy as np import time a = np.random.rand(80000) a = np.cumsum(a) d = 3 start_time1 = time.time() for i in range(10000): b = a % d d += 0.001 end_time1 = time.time() d = 3 start_time2 = time.time() for i in range(10000): b = a - (d * np.floor(a / d)) d += 0.001 end_time2 = time.time() print((end_time1 - start_time1) / 10000) print((end_time2 - start_time2) / 10000)
The output is:
0.0031344462633132934 0.00022937238216400147
when increasing the array size to 800,000:
0.014903099656105041 0.010498356819152833
(For this post I ran the code only once for actual output, while trying to understand the issue I've gotten these results consistently.)
While this solves my runtime problem - I have a hard time understand why. Am I missing something? The only difference I can think of is the overhead of an additional function call, but the first case is pretty extreme (and 1.5x the runtime isn't good enough either), and if that were the case I would think that the existance of the np.remainder
function is pointless.
Edit: I tried testing the same code with non-numpy loops:
import numpy as np import time def pythonic_remainder(array, d): b = np.zeros(len(array)) for i in range(len(array)): b[i] = array[i] % d def split_pythonic_remainder(array, d): b = np.zeros(len(array)) for i in range(len(array)): b[i] = array[i] - (d * np.floor(array[i] / d)) def split_remainder(a, d): return a - (d * np.floor(a / d)) def divide(array, iterations, action): d = 3 for i in range(iterations): b = action(array, d) d += 0.001 a = np.random.rand(80000) a = np.cumsum(a) start_time = time.time() divide(a, 10000, split_remainder) print((time.time() - start_time) / 10000) start_time = time.time() divide(a, 10000, np.remainder) print((time.time() - start_time) / 10000) start_time = time.time() divide(a, 10000, pythonic_remainder) print((time.time() - start_time) / 10000) start_time = time.time() divide(a, 10000, split_pythonic_remainder) print((time.time() - start_time) / 10000)
The result I get is:
0.0003770533800125122 0.003932329940795899 0.018835473942756652 0.10940513386726379
I find it interesting that the opposite is true in the non-numpy case.
numpy.remainder() is another function for doing mathematical operations in numpy.It returns element-wise remainder of division between two array arr1 and arr2 i.e. arr1 % arr2 .It returns 0 when arr2 is 0 and both arr1 and arr2 are (arrays of) integers.
numpy.remainder() in Python. numpy.remainder() is another function for doing mathematical operations in numpy.It returns element-wise remainder of division between two array arr1 and arr2 i.e. arr1 % arr2 .It returns 0 when arr2 is 0 and both arr1 and arr2 are (arrays of) integers.
Namely, it provides an easy and flexible interface to optimized computation with arrays of data. Computation on NumPy arrays can be very fast, or it can be very slow. The key to making it fast is to use vectorized operations, generally implemented through NumPy's universal functions (ufuncs).
Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy; for example, the + operator is a wrapper for the add function: The following table lists the arithmetic operators implemented in NumPy:
My best hypothesis is that your NumPy install is using an unoptimized fmod
inside the %
calculation. Here's why.
First, I can't reproduce your results on a normal pip installed version of NumPy 1.15.1. I get only about a 10% performance difference (asdf.py contains your timing code):
$ python3.6 asdf.py 0.0006543657302856445 0.0006025806903839111
I can reproduce a major performance discrepancy with a manual build (python3.6 setup.py build_ext --inplace -j 4
) of v1.15.1 from a clone of the NumPy Git repository, though:
$ python3.6 asdf.py 0.00242799973487854 0.0006397026300430298
This suggests that my pip-installed build's %
is better optimized than my manual build or what you have installed.
Looking under the hood, it's tempting to look at the implementation of floating-point %
in NumPy and blame the slowdown on the unnecessary floordiv calculation (npy_divmod@c@
calculates both //
and %
):
NPY_NO_EXPORT void @TYPE@_remainder(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func)) { BINARY_LOOP { const @type@ in1 = *(@type@ *)ip1; const @type@ in2 = *(@type@ *)ip2; npy_divmod@c@(in1, in2, (@type@ *)op1); } }
but in my experiments, removing the floordiv provided no benefit. It looks easy enough for a compiler to optimize out, so maybe it was optimized out, or maybe it was just a negligible fraction of the runtime in the first place.
Rather than the floordiv, let's focus on just one line in npy_divmod@c@
, the fmod
call:
mod = npy_fmod@c@(a, b);
This is the initial remainder computation, before special-case handling and adjusting the result to match the sign of the right-hand operand. If we compare the performance of %
with numpy.fmod
on my manual build:
>>> import timeit >>> import numpy >>> a = numpy.arange(1, 8000, dtype=float) >>> timeit.timeit('a % 3', globals=globals(), number=1000) 0.3510419335216284 >>> timeit.timeit('numpy.fmod(a, 3)', globals=globals(), number=1000) 0.33593094255775213 >>> timeit.timeit('a - 3*numpy.floor(a/3)', globals=globals(), number=1000) 0.07980139832943678
We see that fmod
appears to be responsible for almost the entire runtime of %
.
I haven't disassembled the generated binary or stepped through it in an instruction-level debugger to see exactly what gets executed, and of course, I don't have access to your machine or your copy of NumPy. Still, from the above evidence, fmod
seems like a pretty likely culprit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With