Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Numpy to generate random combinations of two arrays without repetition

Given two arrays, for example [0,0,0] and [1,1,1], it is already clear (see here) how to generate all the combinations, i.e., [[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]]. itertools (combinations or product) and numpy.meshgrid are the most common ways as far as I know.

However, I could't find any discussions on how to generate this combinations randomly, without repetitions.

An easy solution could be to generate all the combinations and then choose some of them randomly. For example:

# Three random combinations of [0,0,0] and [1,1,1]
comb = np.array(np.meshgrid([0,1],[0,1],[0,1])).T.reshape(-1,3)
result = comb[np.random.choice(len(comb),3,replace=False),:]

However, this is infeasible when the number of combinations is too big.

Is there a way to generate random combinations without replacement in Python (possibly with Numpy) without generating all the combinations?

EDIT: You can notice in the accepted answer that we also got for free a technique to generate random binary vectors without repetitions, which is just a single line (described in the Bonus Section).

like image 676
Gioker Avatar asked Dec 19 '16 06:12

Gioker


1 Answers

Here's a vectorized approach without generating all combinations -

def unique_combs(A, N):
    # A : 2D Input array with each row representing one group
    # N : No. of combinations needed
    m,n = A.shape
    dec_idx = np.random.choice(2**m,N,replace=False)
    idx = ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
    return  A[np.arange(m),idx]

Please note that this assumes we are dealing with equal number of elements per group.

Explanation

To give it a bit of explanation, let's say the groups are stored in a 2D array -

In [44]: A
Out[44]: 
array([[4, 2],   <-- group #1
       [3, 5],   <-- group #2
       [8, 6]])  <-- group #3

We have two elems per group. Let's say we are looking for 4 unique group combinations : N = 4. To select from two numbers from each of those three groups, we would have a total of 8 unique combinations.

Let's generate N unique numbers in that interval of 8 using np.random.choice(8, N, replace=False) -

In [86]: dec_idx = np.random.choice(8,N,replace=False)

In [87]: dec_idx
Out[87]: array([2, 3, 7, 0])

Then, convert those to binary equivalents as later on we need those to index into each row of A -

In [88]: idx = ((dec_idx[:,None] & (1 << np.arange(3)))!=0).astype(int)

In [89]: idx
Out[89]: 
array([[0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [0, 0, 0]])

Finally, with fancy-indexing, we get those elems off A -

In [90]: A[np.arange(3),idx]
Out[90]: 
array([[4, 5, 8],
       [2, 5, 8],
       [2, 5, 6],
       [4, 3, 8]])

Sample run

In [80]: # Original code that generates all combs
    ...: comb = np.array(np.meshgrid([4,2],[3,5],[8,6])).T.reshape(-1,3)
    ...: result = comb[np.random.choice(len(comb),4,replace=False),:]
    ...: 

In [81]: A = np.array([[4,2],[3,5],[8,6]]) # 2D array of groups

In [82]: unique_combs(A, 3) # 3 combinations
Out[82]: 
array([[2, 3, 8],
       [4, 3, 6],
       [2, 3, 6]])

In [83]: unique_combs(A, 4) # 4 combinations
Out[83]: 
array([[2, 3, 8],
       [4, 3, 6],
       [2, 5, 6],
       [4, 5, 8]])

Bonus section

Explanation on ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int) :

That step is basically converting decimal numbers to binary equivalents. Let's break it down to smaller steps for a closer look.

1) Input array of decimal numbers -

In [18]: dec_idx
Out[18]: array([7, 6, 4, 0])

2) Convert to 2D upon inserting new axis with None/np.newaxis -

In [19]: dec_idx[:,None]
Out[19]: 
array([[7],
       [6],
       [4],
       [0]])

3) Let's assume m = 3, i.e. we want to convert to 3 binary digit number equivalents.

We create 2-powered range array with bit-shift operation -

In [16]: (1 << np.arange(m))
Out[16]: array([1, 2, 4])

Alternatively, an explicit way would be -

In [20]: 2**np.arange(m)
Out[20]: array([1, 2, 4])

4) Now, the crux of the cryptic step there. We perform broadcasted bitwise AND-ind between 2D dec_idx and 2-powered range array.

Consider the first element from dec_idx : 7. We are performing bitiwse AND-ing of 7 against 1, 2, 4. Think of it as a filtering process, as we filter 7 at each binary interval of 1, 2, 4 as they represent the three binary digits. Similarly, we do this for all elems off dec_idx in a vectorized manner with broadcasting.

Thus, we would get the bit-wise AND-ing results like so -

In [43]: (dec_idx[:,None] & (1 << np.arange(m)))
Out[43]: 
array([[1, 2, 4],
       [0, 2, 4],
       [0, 0, 4],
       [0, 0, 0]])

The filtered numbers thus obtained are either 0 or the 2-powered range array numbers themselves. So, to have the binary equivalents, we just need to consider all non-zeros as 1s and zeros as 0s.

In [44]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0)
Out[44]: 
array([[ True,  True,  True],
       [False,  True,  True],
       [False, False,  True],
       [False, False, False]], dtype=bool)

In [45]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
Out[45]: 
array([[1, 1, 1],
       [0, 1, 1],
       [0, 0, 1],
       [0, 0, 0]])

Thus, we have the binary numbers with MSBs to the right.

like image 67
Divakar Avatar answered Oct 23 '22 17:10

Divakar