I have a very big (length ~ 150 millions) numpy array that has very few non zero values (about 99.9% of the array is 0). I want to shuffle it, but the shuffle is slow (it takes about 10 seconds, which is not acceptable because I am doing Monte Carlo simulations). Is there a way to shuffle it in a way that takes into account the fact that my array is mostly composed of 0?
I am thinking of shuffling just my positive values and then insert it randomly in an array full of 0
's, but I cannot find a numpy function for that.
Approach #1 : Here's one approach -
def shuffle_sparse_arr(a):
out = np.zeros_like(a)
mask = a!=0
n = np.count_nonzero(mask)
idx = np.random.choice(a.size, n, replace=0)
out[idx] = a[mask]
return out
Approach #2 : Hackish way -
def shuffle_sparse_arr_hackish(a):
out = np.zeros_like(a)
mask = a!=0
n = np.count_nonzero(mask)
idx = np.unique((a.size*np.random.rand(2*n)).astype(int))[:n]
while idx.size<n:
idx = np.unique((a.size*np.random.rand(2*n)).astype(int))[:n]
np.random.shuffle(idx)
out[idx] = a[mask]
return out
Sample runs -
In [269]: # Setup input array
...: a = np.zeros((20),dtype=int)
...: sidx = np.random.choice(a.size, 6, replace=0)
...: a[sidx] = [5,8,4,1,7,3]
...:
In [270]: a
Out[270]: array([4, 0, 0, 8, 0, 0, 5, 0, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 3])
In [271]: shuffle_sparse_arr(a)
Out[271]: array([0, 5, 0, 0, 0, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 7, 3, 8, 0, 0])
In [272]: shuffle_sparse_arr_hackish(a)
Out[272]: array([3, 1, 5, 0, 4, 0, 7, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Runtime test -
In [288]: # Setup input array with 15 million and 99.9% zeros
...: a = np.zeros((15000000),dtype=int)
...:
...: # Set 100-99.9% as random non-zeros
...: n = int(a.size*((100-99.9)/100))
...:
...: set_idx = np.random.choice(a.size, n , replace=0)
...: nums = np.random.choice(a.size, n , replace=0)
...: a[set_idx] = nums
...:
In [289]: %timeit shuffle_sparse_arr(a)
1 loops, best of 3: 647 ms per loop
In [290]: %timeit shuffle_sparse_arr_hackish(a)
10 loops, best of 3: 29.1 ms per loop
In [291]: %timeit np.random.shuffle(a)
1 loops, best of 3: 606 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With