Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting a range by rows in numpy

Tags:

python

numpy

I have a function that produces an array like this:

my_array = np.array([list(str(i).zfill(4)) for i in range(10000)], dtype=int)

Which outputs:

array([[0, 0, 0, 0],
       [0, 0, 0, 1],
       [0, 0, 0, 2],
       ...,
       [9, 9, 9, 7],
       [9, 9, 9, 8],
       [9, 9, 9, 9]])

As you can see by converting ints to strings and lists, and then back to int, this is highly inefficient, and my real needs is for a much larger array (larger range). I tried looking into numpy to find a more efficient way to generate this array / list, but could not find a way. The best i've got so far is arange which will give a range from 1...9999 but not separated into lists.

Any ideas?

like image 218
Ofer Sadan Avatar asked Mar 06 '23 18:03

Ofer Sadan


1 Answers

Here's one based on cartesian_product_broadcasted -

import functools

def cartesian_product_ranges(shape, out_dtype='int'):
    arrays = [np.arange(s, dtype=out_dtype) for s in shape]
    broadcastable = np.ix_(*arrays)
    broadcasted = np.broadcast_arrays(*broadcastable)
    rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), \
                                                  len(broadcasted)
    out = np.empty(rows * cols, dtype=out_dtype)
    start, end = 0, rows
    for a in broadcasted:
        out[start:end] = a.reshape(-1)
        start, end = end, end + rows
    N = len(shape)
    return np.moveaxis(out.reshape((-1,) + tuple(shape)),0,-1).reshape(-1,N)

Sample run -

In [116]: cartesian_product_ranges([3,2,4])
Out[116]: 
array([[0, 0, 0],
       [0, 0, 1],
       [0, 0, 2],
       [0, 0, 3],
       [0, 1, 0],
       [0, 1, 1],
       [0, 1, 2],
       [0, 1, 3],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 2],
       [1, 0, 3],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 2],
       [1, 1, 3],
       [2, 0, 0],
       [2, 0, 1],
       [2, 0, 2],
       [2, 0, 3],
       [2, 1, 0],
       [2, 1, 1],
       [2, 1, 2],
       [2, 1, 3]])

Run and timings on 10-ranged array with 4 cols -

In [119]: cartesian_product_ranges([10]*4)
Out[119]: 
array([[0, 0, 0, 0],
       [0, 0, 0, 1],
       [0, 0, 0, 2],
       ...,
       [9, 9, 9, 7],
       [9, 9, 9, 8],
       [9, 9, 9, 9]])

In [120]: cartesian_product_ranges([10]*4).shape
Out[120]: (10000, 4)

In [121]: %timeit cartesian_product_ranges([10]*4)
10000 loops, best of 3: 105 µs per loop

In [122]: %timeit np.array([list(str(i).zfill(4)) for i in range(10000)], dtype=int)
100 loops, best of 3: 16.7 ms per loop

In [123]: 16700.0/105
Out[123]: 159.04761904761904

Around 160x speedup!

For 10-ranged array with 9 columns, we can use lower-precision uint8 dtype -

In [7]: %timeit cartesian_product_ranges([10]*9, out_dtype=np.uint8)
1 loop, best of 3: 3.36 s per loop
like image 127
Divakar Avatar answered Mar 30 '23 01:03

Divakar