Is there a faster alternative to np.diff?

Tags:

I'm concerned with the speed of the following function:

def cch(tau):
    return np.sum(abs(-1*np.diff(cartprod)-tau)<0.001)

Where "cartprod" is a variable for a list that looks like this:

Click to copy

cartprod = np.ndarray([[0.0123,0.0123],[0.0123,0.0459],...])

The length of this list is about 25 million. Basically, I'm trying to find a significantly faster way to return a list of differences for every pair list in that np.ndarray. Is there an algorithmic way or function that's faster than np.diff? Or, is np.diff the end all be all? I'm also open to anything else.

EDIT: Thank you all for your solutions!

817

asked Oct 25 '18 02:10

Matthew K

1 Answers

I think you're hitting a wall by repeatedly returning multiple np.arrays of length ~25 million rather than np.diff being slow. I wrote an equivalent function that iterates over the array and tallies the results as it goes along. The function needs to be jitted with numba to be fast. I hope that is acceptable.

Click to copy

arr = np.random.rand(25000000, 2)

def cch(tau, cartprod):
    return np.sum(abs(-1*np.diff(cartprod)-tau)<0.001)
%timeit cch(0.01, arr)

@jit(nopython=True)
def cch_jit(tau, cartprod):
    count = 0
    tau = -tau
    for i in range(cartprod.shape[0]):
        count += np.less(np.abs(tau - (cartprod[i, 1]- cartprod[i, 0])), 0.001)
    return count
%timeit cch_jit(0.01, arr)

produces

Click to copy

294 ms ± 2.82 ms 
42.7 ms ± 483 µs

which is about ~6 times faster.

173

answered Oct 21 '22 04:10

alexdor

Related questions
                            
                                Message "Exception ignored" when dealing pandas.datetime type
                            
                                How to use He initialization in TensorFlow
                            
                                AWS Rekognition detect label Invalid image encoding error
                            
                                Django: filter queryset by multiple ID
                            
                                Python: Pyppeteer clicking on pop up window
                            
                                Merging multiple bands together through gdal...correctly
                            
                                repeating the rows of a data frame
                            
                                Pandas: Sum of the Max 3 Column Values in Each Row
                            
                                Snakemake wants to run job although output file already exists
                            
                                How to resolve TypeError: 'float' object is not callable
                            
                                Basic auth authentication in Bottle
                            
                                Get percentages of a column based off of another column but with different categories
                            
                                List sort based on another shorter list
                            
                                File "<string>", line 1, in <module> NameError: name ' ' is not defined in ATOM [duplicate]
                            
                                Pandas: for all set of duplicate entries in a particular column, grab some information
                            
                                Pyinstaller generated exe doesn't work properly
                            
                                How to store %%time values in a variable in Jupyter? [duplicate]
                            
                                Django - Filter the prefetch_related queryset
                            
                                Error- AttributeError: 'DirectoryIterator' object has no attribute 'ndim in autoencoder design in keras
                            
                                How to connect to Odoo database from an android application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a faster alternative to np.diff?

Tags:

performance

python

time-complexity

numpy

Matthew K

People also ask

1 Answers

alexdor

Recent Activity

Donate For Us