Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python shuffle array that has very few non zeros (very sparsey)

I have a very big (length ~ 150 millions) numpy array that has very few non zero values (about 99.9% of the array is 0). I want to shuffle it, but the shuffle is slow (it takes about 10 seconds, which is not acceptable because I am doing Monte Carlo simulations). Is there a way to shuffle it in a way that takes into account the fact that my array is mostly composed of 0?

I am thinking of shuffling just my positive values and then insert it randomly in an array full of 0's, but I cannot find a numpy function for that.

like image 459
BillyBoy Avatar asked Mar 10 '23 09:03

BillyBoy


1 Answers

Approach #1 : Here's one approach -

def shuffle_sparse_arr(a):
    out = np.zeros_like(a)
    mask = a!=0
    n = np.count_nonzero(mask)
    idx = np.random.choice(a.size, n, replace=0)
    out[idx] = a[mask]
    return out

Approach #2 : Hackish way -

def shuffle_sparse_arr_hackish(a):
    out = np.zeros_like(a)
    mask = a!=0
    n = np.count_nonzero(mask)
    idx = np.unique((a.size*np.random.rand(2*n)).astype(int))[:n]
    while idx.size<n:
        idx = np.unique((a.size*np.random.rand(2*n)).astype(int))[:n]
    np.random.shuffle(idx)
    out[idx] = a[mask]
    return out

Sample runs -

In [269]: # Setup input array
     ...: a = np.zeros((20),dtype=int)
     ...: sidx = np.random.choice(a.size, 6, replace=0)
     ...: a[sidx] = [5,8,4,1,7,3]
     ...: 

In [270]: a
Out[270]: array([4, 0, 0, 8, 0, 0, 5, 0, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 3])

In [271]: shuffle_sparse_arr(a)
Out[271]: array([0, 5, 0, 0, 0, 0, 1, 0, 4, 0, 0, 0, 0, 0, 0, 7, 3, 8, 0, 0])

In [272]: shuffle_sparse_arr_hackish(a)
Out[272]: array([3, 1, 5, 0, 4, 0, 7, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Runtime test -

In [288]: # Setup input array with 15 million and 99.9% zeros
     ...: a = np.zeros((15000000),dtype=int)
     ...: 
     ...: # Set 100-99.9% as random non-zeros
     ...: n = int(a.size*((100-99.9)/100)) 
     ...: 
     ...: set_idx = np.random.choice(a.size, n , replace=0)
     ...: nums = np.random.choice(a.size, n , replace=0)
     ...: a[set_idx] = nums
     ...: 

In [289]: %timeit shuffle_sparse_arr(a)
1 loops, best of 3: 647 ms per loop

In [290]: %timeit shuffle_sparse_arr_hackish(a)
10 loops, best of 3: 29.1 ms per loop

In [291]: %timeit np.random.shuffle(a)
1 loops, best of 3: 606 ms per loop
like image 54
Divakar Avatar answered Mar 25 '23 02:03

Divakar