I have a function that produces an array like this:
my_array = np.array([list(str(i).zfill(4)) for i in range(10000)], dtype=int)
Which outputs:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 2],
...,
[9, 9, 9, 7],
[9, 9, 9, 8],
[9, 9, 9, 9]])
As you can see by converting int
s to strings and lists, and then back to int
, this is highly inefficient, and my real needs is for a much larger array (larger range). I tried looking into numpy to find a more efficient way to generate this array / list, but could not find a way. The best i've got so far is arange
which will give a range from 1...9999 but not separated into lists.
Any ideas?
Here's one based on cartesian_product_broadcasted
-
import functools
def cartesian_product_ranges(shape, out_dtype='int'):
arrays = [np.arange(s, dtype=out_dtype) for s in shape]
broadcastable = np.ix_(*arrays)
broadcasted = np.broadcast_arrays(*broadcastable)
rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), \
len(broadcasted)
out = np.empty(rows * cols, dtype=out_dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
N = len(shape)
return np.moveaxis(out.reshape((-1,) + tuple(shape)),0,-1).reshape(-1,N)
Sample run -
In [116]: cartesian_product_ranges([3,2,4])
Out[116]:
array([[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
[0, 1, 3],
[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 0, 3],
[1, 1, 0],
[1, 1, 1],
[1, 1, 2],
[1, 1, 3],
[2, 0, 0],
[2, 0, 1],
[2, 0, 2],
[2, 0, 3],
[2, 1, 0],
[2, 1, 1],
[2, 1, 2],
[2, 1, 3]])
Run and timings on 10-ranged
array with 4
cols -
In [119]: cartesian_product_ranges([10]*4)
Out[119]:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 2],
...,
[9, 9, 9, 7],
[9, 9, 9, 8],
[9, 9, 9, 9]])
In [120]: cartesian_product_ranges([10]*4).shape
Out[120]: (10000, 4)
In [121]: %timeit cartesian_product_ranges([10]*4)
10000 loops, best of 3: 105 µs per loop
In [122]: %timeit np.array([list(str(i).zfill(4)) for i in range(10000)], dtype=int)
100 loops, best of 3: 16.7 ms per loop
In [123]: 16700.0/105
Out[123]: 159.04761904761904
Around 160x
speedup!
For 10-ranged
array with 9
columns, we can use lower-precision uint8
dtype -
In [7]: %timeit cartesian_product_ranges([10]*9, out_dtype=np.uint8)
1 loop, best of 3: 3.36 s per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With