Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to create an array that is a sequence of variable length ranges in numpy

Tags:

python

numpy

Suppose I have an array

import numpy as np
x=np.array([5,7,2])

I want to create an array that contains a sequence of ranges stacked together with the length of each range given by x:

y=np.hstack([np.arange(1,n+1) for n in x])

Is there some way to do this without the speed penalty of a list comprehension or looping. (x could be a very large array)

The result should be

y == np.array([1,2,3,4,5,1,2,3,4,5,6,7,1,2])
like image 238
dlm Avatar asked Aug 06 '13 15:08

dlm


People also ask

How do I create a NumPy sequence?

Problem: How to create a sequence of linearly increasing values? Solution: Use NumPy's arange() function. The np. arange([start,] stop[, step]) function creates a new NumPy array with evenly-spaced integers between start (inclusive) and stop (exclusive).

Are NumPy arrays more efficient?

Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.

How do you create an equally spaced array in Python?

NumPy has a built-in method called arange() which is capable of creating an array of a given integer type(in bytes), with equal spacing between the numbers. Parameters: start: This is a default parameter. The value of the first number of the element, the default value of this parameter is 0.


2 Answers

You could use accumulation:

def my_sequences(x):
    x = x[x != 0] # you can skip this if you do not have 0s in x.

    # Create result array, filled with ones:
    y = np.cumsum(x, dtype=np.intp)
    a = np.ones(y[-1], dtype=np.intp)

    # Set all beginnings to - previous length:
    a[y[:-1]] -= x[:-1]

    # and just add it all up (btw. np.add.accumulate is equivalent):
    return np.cumsum(a, out=a) # here, in-place should be safe.

(One word of caution: If you result array would be larger then the possible size np.iinfo(np.intp).max this might with some bad luck return wrong results instead of erroring out cleanly...)

And because everyone always wants timings (compared to Ophion's) method:

In [11]: x = np.random.randint(0, 20, 1000000)

In [12]: %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
1 loops, best of 3: 753 ms per loop

In [13]: %timeit my_sequences(x)
1 loops, best of 3: 191 ms per loop

of course the my_sequences function will not ill-perform when the values of x get large.

like image 147
seberg Avatar answered Oct 19 '22 23:10

seberg


First idea; prevent multiple calls to np.arange and concatenate should be much faster then hstack:

import numpy as np
x=np.array([5,7,2])

>>>a=np.arange(1,x.max()+1)
>>> np.hstack([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

>>> np.concatenate([a[:k] for k in x])
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

If there are many nonunique values this seems more efficient:

>>>ua,uind=np.unique(x,return_inverse=True)
>>>a=[np.arange(1,k+1) for k in ua]
>>>np.concatenate(np.take(a,uind))

array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

Some timings for your case:

x=np.random.randint(0,20,1000000) 

Original code

#Using hstack
%timeit np.hstack([np.arange(1,n+1) for n in x])
1 loops, best of 3: 7.46 s per loop

#Using concatenate
%timeit np.concatenate([np.arange(1,n+1) for n in x])
1 loops, best of 3: 5.27 s per loop

First code:

#Using hstack
%timeit a=np.arange(1,x.max()+1);np.hstack([a[:k] for k in x])
1 loops, best of 3: 3.03 s per loop

#Using concatenate
%timeit a=np.arange(1,x.max()+1);np.concatenate([a[:k] for k in x])
10 loops, best of 3: 998 ms per loop

Second code:

%timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind))
10 loops, best of 3: 522 ms per loop

Looks like we gain a 14x speedup with the final code.

Small sanity check:

ua,uind=np.unique(x,return_inverse=True)
a=[np.arange(1,k+1) for k in ua]
out=np.concatenate(np.take(a,uind))

>>>out.shape
(9498409,)

>>>np.sum(x)
9498409
like image 35
Daniel Avatar answered Oct 20 '22 01:10

Daniel