Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remainder function (%) runtime on numpy arrays is far longer than manual remainder calculation

In the past few days I've been working on improving the runtime of a python function which requires many uses of the remainder function (%) among other things. My main test case is over an 80,000 element numpy array (monotonically increasing), with 10000 iterations, although I've tried on various other sizes as well.

Eventually I reached a point where the remainder function is a major bottleneck, and tried various solutions. This is the behaviour I found when running the following code:

import numpy as np import time  a = np.random.rand(80000) a = np.cumsum(a) d = 3 start_time1 = time.time() for i in range(10000):     b = a % d     d += 0.001 end_time1 = time.time() d = 3 start_time2 = time.time() for i in range(10000):     b = a - (d * np.floor(a / d))     d += 0.001 end_time2 = time.time() print((end_time1 - start_time1) / 10000) print((end_time2 - start_time2) / 10000) 

The output is:

0.0031344462633132934 0.00022937238216400147 

when increasing the array size to 800,000:

0.014903099656105041 0.010498356819152833 

(For this post I ran the code only once for actual output, while trying to understand the issue I've gotten these results consistently.)

While this solves my runtime problem - I have a hard time understand why. Am I missing something? The only difference I can think of is the overhead of an additional function call, but the first case is pretty extreme (and 1.5x the runtime isn't good enough either), and if that were the case I would think that the existance of the np.remainder function is pointless.

Edit: I tried testing the same code with non-numpy loops:

import numpy as np import time   def pythonic_remainder(array, d):     b = np.zeros(len(array))     for i in range(len(array)):         b[i] = array[i] % d  def split_pythonic_remainder(array, d):     b = np.zeros(len(array))     for i in range(len(array)):         b[i] = array[i] - (d * np.floor(array[i] / d))  def split_remainder(a, d):     return a - (d * np.floor(a / d))  def divide(array, iterations, action):     d = 3     for i in range(iterations):         b = action(array, d)         d += 0.001  a = np.random.rand(80000) a = np.cumsum(a) start_time = time.time() divide(a, 10000, split_remainder) print((time.time() - start_time) / 10000)  start_time = time.time() divide(a, 10000, np.remainder) print((time.time() - start_time) / 10000) start_time = time.time() divide(a, 10000, pythonic_remainder) print((time.time() - start_time) / 10000)  start_time = time.time() divide(a, 10000, split_pythonic_remainder) print((time.time() - start_time) / 10000) 

The result I get is:

0.0003770533800125122 0.003932329940795899 0.018835473942756652 0.10940513386726379 

I find it interesting that the opposite is true in the non-numpy case.

like image 748
shaul Avatar asked Sep 09 '18 08:09

shaul


People also ask

What is the use of remainder in NumPy?

numpy.remainder() is another function for doing mathematical operations in numpy.It returns element-wise remainder of division between two array arr1 and arr2 i.e. arr1 % arr2 .It returns 0 when arr2 is 0 and both arr1 and arr2 are (arrays of) integers.

What is the remainder of an array in Python?

numpy.remainder() in Python. numpy.remainder() is another function for doing mathematical operations in numpy.It returns element-wise remainder of division between two array arr1 and arr2 i.e. arr1 % arr2 .It returns 0 when arr2 is 0 and both arr1 and arr2 are (arrays of) integers.

What are the advantages of NumPy?

Namely, it provides an easy and flexible interface to optimized computation with arrays of data. Computation on NumPy arrays can be very fast, or it can be very slow. The key to making it fast is to use vectorized operations, generally implemented through NumPy's universal functions (ufuncs).

What are arithmetic operators in NumPy?

Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy; for example, the + operator is a wrapper for the add function: The following table lists the arithmetic operators implemented in NumPy:


1 Answers

My best hypothesis is that your NumPy install is using an unoptimized fmod inside the % calculation. Here's why.


First, I can't reproduce your results on a normal pip installed version of NumPy 1.15.1. I get only about a 10% performance difference (asdf.py contains your timing code):

$ python3.6 asdf.py 0.0006543657302856445 0.0006025806903839111 

I can reproduce a major performance discrepancy with a manual build (python3.6 setup.py build_ext --inplace -j 4) of v1.15.1 from a clone of the NumPy Git repository, though:

$ python3.6 asdf.py 0.00242799973487854 0.0006397026300430298 

This suggests that my pip-installed build's % is better optimized than my manual build or what you have installed.


Looking under the hood, it's tempting to look at the implementation of floating-point % in NumPy and blame the slowdown on the unnecessary floordiv calculation (npy_divmod@c@ calculates both // and %):

NPY_NO_EXPORT void @TYPE@_remainder(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func)) {     BINARY_LOOP {         const @type@ in1 = *(@type@ *)ip1;         const @type@ in2 = *(@type@ *)ip2;         npy_divmod@c@(in1, in2, (@type@ *)op1);     } } 

but in my experiments, removing the floordiv provided no benefit. It looks easy enough for a compiler to optimize out, so maybe it was optimized out, or maybe it was just a negligible fraction of the runtime in the first place.

Rather than the floordiv, let's focus on just one line in npy_divmod@c@, the fmod call:

mod = npy_fmod@c@(a, b); 

This is the initial remainder computation, before special-case handling and adjusting the result to match the sign of the right-hand operand. If we compare the performance of % with numpy.fmod on my manual build:

>>> import timeit >>> import numpy >>> a = numpy.arange(1, 8000, dtype=float) >>> timeit.timeit('a % 3', globals=globals(), number=1000) 0.3510419335216284 >>> timeit.timeit('numpy.fmod(a, 3)', globals=globals(), number=1000) 0.33593094255775213 >>> timeit.timeit('a - 3*numpy.floor(a/3)', globals=globals(), number=1000) 0.07980139832943678 

We see that fmod appears to be responsible for almost the entire runtime of %.


I haven't disassembled the generated binary or stepped through it in an instruction-level debugger to see exactly what gets executed, and of course, I don't have access to your machine or your copy of NumPy. Still, from the above evidence, fmod seems like a pretty likely culprit.

like image 176
user2357112 supports Monica Avatar answered Oct 02 '22 20:10

user2357112 supports Monica