How to efficiently concatenate many arange calls in numpy?

Tags:

I'd like to vectorize calls like numpy.arange(0, cnt_i) over a vector of cnt values and concatenate the results like this snippet:

import numpy
cnts = [1,2,3]
numpy.concatenate([numpy.arange(cnt) for cnt in cnts])

array([0, 0, 1, 0, 1, 2])

Unfortunately the code above is very memory inefficient due to the temporary arrays and list comprehension looping.

Is there a way to do this more efficiently in numpy?

869

asked Nov 17 '13 06:11

Joseph Hastings

3 Answers

Here's a completely vectorized function:

def multirange(counts):
    counts = np.asarray(counts)
    # Remove the following line if counts is always strictly positive.
    counts = counts[counts != 0]

    counts1 = counts[:-1]
    reset_index = np.cumsum(counts1)

    incr = np.ones(counts.sum(), dtype=int)
    incr[0] = 0
    incr[reset_index] = 1 - counts1

    # Reuse the incr array for the final result.
    incr.cumsum(out=incr)
    return incr

Here's a variation of @Developer's answer that only calls arange once:

def multirange_loop(counts):
    counts = np.asarray(counts)
    ranges = np.empty(counts.sum(), dtype=int)
    seq = np.arange(counts.max())
    starts = np.zeros(len(counts), dtype=int)
    starts[1:] = np.cumsum(counts[:-1])
    for start, count in zip(starts, counts):
        ranges[start:start + count] = seq[:count]
    return ranges

And here's the original version, written as a function:

def multirange_original(counts):
    ranges = np.concatenate([np.arange(count) for count in counts])
    return ranges

Demo:

In [296]: multirange_original([1,2,3])
Out[296]: array([0, 0, 1, 0, 1, 2])

In [297]: multirange_loop([1,2,3])
Out[297]: array([0, 0, 1, 0, 1, 2])

In [298]: multirange([1,2,3])
Out[298]: array([0, 0, 1, 0, 1, 2])

Compare timing using a larger array of counts:

In [299]: counts = np.random.randint(1, 50, size=50)

In [300]: %timeit multirange_original(counts)
10000 loops, best of 3: 114 µs per loop

In [301]: %timeit multirange_loop(counts)
10000 loops, best of 3: 76.2 µs per loop

In [302]: %timeit multirange(counts)
10000 loops, best of 3: 26.4 µs per loop

120

answered Oct 19 '22 22:10

Warren Weckesser

Try the following for solving memory problem, efficiency is almost the same.

out = np.empty((sum(cnts)))
k = 0
for cnt in cnts:
    out[k:k+cnt] = np.arange(cnt)
    k += cnt

so no concatenation is used.

answered Oct 19 '22 21:10

Developer

np.tril_indices pretty much does this for you:

In [28]: def f(c):
   ....:     return np.tril_indices(c, -1)[1]

In [29]: f(10)
Out[29]:
array([0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1,
       2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 8])

In [33]: %timeit multirange(range(10))
10000 loops, best of 3: 93.2 us per loop

In [34]: %timeit f(10)
10000 loops, best of 3: 68.5 us per loop

much faster than @Warren Weckesser multirange when the dimension is small.

But becomes much slower when the dimension is larger (@hpaulj, you have a very good point):

In [36]: %timeit multirange(range(1000))
100 loops, best of 3: 5.62 ms per loop

In [37]: %timeit f(1000)
10 loops, best of 3: 68.6 ms per loop

answered Oct 19 '22 22:10

CT Zhu

Related questions
                            
                                Python -- Ancient Taxation
                            
                                How to see output of print statements when scrapy logger enabled
                            
                                understanding python twisted asynchronicity in terms of operating system
                            
                                Finding the intersection of a curve from polyfit
                            
                                RBF interpolation: LinAlgError: singular matrix
                            
                                Execute Shell Script from Python with multiple pipes
                            
                                How to log error messages with Flask and foreman (heroku)?
                            
                                How to use super to initialize all the parents when using multiple inheritance
                            
                                Plotting stochastic processes in Python
                            
                                How to Normalize Names
                            
                                In python, can one iterate through large text files using buffers and get the correct file position at the same time?
                            
                                How to get a 16bit Unsigned integer in python
                            
                                How to strip the string which contain forward slash?
                            
                                How can I uninstall Flask?
                            
                                Combining memoization and tail call optimization
                            
                                Cannot use Python select.poll in Mac OS?
                            
                                Installing Pyomo on Windows with Anaconda (Python)
                            
                                How do I compile Pyparsing with Cython on WIndows?
                            
                                Get start and stop indexes of overlapping matches?
                            
                                python counting number of presence and absence of substrings in list of sequences

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to efficiently concatenate many arange calls in numpy?

Tags:

python

arrays

vectorization

numpy