Efficient way to create an array that is a sequence of variable length ranges in numpy

Tags:

python

numpy

Suppose I have an array

import numpy as np
x=np.array([5,7,2])

I want to create an array that contains a sequence of ranges stacked together with the length of each range given by x:

y=np.hstack([np.arange(1,n+1) for n in x])

Is there some way to do this without the speed penalty of a list comprehension or looping. (x could be a very large array)

The result should be

y == np.array([1,2,3,4,5,1,2,3,4,5,6,7,1,2])

238

asked Aug 06 '13 15:08

dlm

2 Answers

You could use accumulation:

def my_sequences(x):
    x = x[x != 0] # you can skip this if you do not have 0s in x.

    # Create result array, filled with ones:
    y = np.cumsum(x, dtype=np.intp)
    a = np.ones(y[-1], dtype=np.intp)

    # Set all beginnings to - previous length:
    a[y[:-1]] -= x[:-1]

    # and just add it all up (btw. np.add.accumulate is equivalent):
    return np.cumsum(a, out=a) # here, in-place should be safe.

(One word of caution: If you result array would be larger then the possible size np.iinfo(np.intp).max this might with some bad luck return wrong results instead of erroring out cleanly...)

And because everyone always wants timings (compared to Ophion's) method:

In [11]: x = np.random.randint(0, 20, 1000000)

In [12]: %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
1 loops, best of 3: 753 ms per loop

In [13]: %timeit my_sequences(x)
1 loops, best of 3: 191 ms per loop

of course the my_sequences function will not ill-perform when the values of x get large.

147

answered Oct 19 '22 23:10

seberg

First idea; prevent multiple calls to np.arange and concatenate should be much faster then hstack:

import numpy as np
x=np.array([5,7,2])

>>>a=np.arange(1,x.max()+1)
>>> np.hstack([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

>>> np.concatenate([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

If there are many nonunique values this seems more efficient:

>>>ua,uind=np.unique(x,return_inverse=True)
>>>a=[np.arange(1,k+1) for k in ua]
>>>np.concatenate(np.take(a,uind))

array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

Some timings for your case:

x=np.random.randint(0,20,1000000)

Original code

#Using hstack
%timeit np.hstack([np.arange(1,n+1) for n in x])
1 loops, best of 3: 7.46 s per loop

#Using concatenate
%timeit np.concatenate([np.arange(1,n+1) for n in x])
1 loops, best of 3: 5.27 s per loop

First code:

#Using hstack
%timeit a=np.arange(1,x.max()+1);np.hstack([a[:k] for k in x])
1 loops, best of 3: 3.03 s per loop

#Using concatenate
%timeit a=np.arange(1,x.max()+1);np.concatenate([a[:k] for k in x])
10 loops, best of 3: 998 ms per loop

Second code:

%timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
10 loops, best of 3: 522 ms per loop

Looks like we gain a 14x speedup with the final code.

Small sanity check:

ua,uind=np.unique(x,return_inverse=True)
a=[np.arange(1,k+1) for k in ua]
out=np.concatenate(np.take(a,uind))

>>>out.shape
(9498409,)

>>>np.sum(x)
9498409

answered Oct 20 '22 01:10

Daniel

Related questions
                            
                                Arabic, Unicode and files in python
                            
                                Initializing 2D lists in Python: How to make deep copies of each row?
                            
                                Efficient python way for recursive equations
                            
                                getting object name and function name
                            
                                Running Jar files from Python
                            
                                'import sitecustomize' failed upon starting spyder
                            
                                What is the difference between syntax error and runtime error?
                            
                                Python splitting string by parentheses
                            
                                Is it me or is pygame.key.get_pressed() not working?
                            
                                python: Given a BytesIO buffer, generate img tag in html?
                            
                                How to efficiently serialize a scikit-learn classifier
                            
                                Python - Create Shortcut with arguments
                            
                                Can numpy einsum() perform outer addition?
                            
                                Using matplotlib, is it possible to set properties for all subplots on a figure at once?
                            
                                PyQt: TableWidget deselect all?
                            
                                Dealing with zeros in numpy array normalization
                            
                                Flask-SQLAlchemy join across 3 models and a Table construct
                            
                                Python daemon no pidfile
                            
                                How may I project vectors onto a plane defined by its orthogonal vector in Python?
                            
                                Is there a way to get the Pyramid json renderer to output formatted, pretty printed, output?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With