Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a faster alternative to np.diff?

I'm concerned with the speed of the following function:

def cch(tau):
    return np.sum(abs(-1*np.diff(cartprod)-tau)<0.001)

Where "cartprod" is a variable for a list that looks like this:

cartprod = np.ndarray([[0.0123,0.0123],[0.0123,0.0459],...])

The length of this list is about 25 million. Basically, I'm trying to find a significantly faster way to return a list of differences for every pair list in that np.ndarray. Is there an algorithmic way or function that's faster than np.diff? Or, is np.diff the end all be all? I'm also open to anything else.

EDIT: Thank you all for your solutions!

like image 817
Matthew K Avatar asked Oct 25 '18 02:10

Matthew K


People also ask

Does NumPy make Python faster?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.

What does NP diff do?

NumPy's np. diff() function calculates the difference between subsequent values in a NumPy array. For example, np. diff([1, 2, 4]) returns the difference array [1 2] .

How do you find the difference between NumPy?

diff(arr[, n[, axis]]) function is used when we calculate the n-th order discrete difference along the given axis. The first order difference is given by out[i] = arr[i+1] – arr[i] along the given axis. If we have to calculate higher differences, we are using diff recursively.


1 Answers

I think you're hitting a wall by repeatedly returning multiple np.arrays of length ~25 million rather than np.diff being slow. I wrote an equivalent function that iterates over the array and tallies the results as it goes along. The function needs to be jitted with numba to be fast. I hope that is acceptable.

arr = np.random.rand(25000000, 2)

def cch(tau, cartprod):
    return np.sum(abs(-1*np.diff(cartprod)-tau)<0.001)
%timeit cch(0.01, arr)

@jit(nopython=True)
def cch_jit(tau, cartprod):
    count = 0
    tau = -tau
    for i in range(cartprod.shape[0]):
        count += np.less(np.abs(tau - (cartprod[i, 1]- cartprod[i, 0])), 0.001)
    return count
%timeit cch_jit(0.01, arr)

produces

294 ms ± 2.82 ms 
42.7 ms ± 483 µs 

which is about ~6 times faster.

like image 173
alexdor Avatar answered Oct 21 '22 04:10

alexdor