Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently generating multiple instances of numpy.random.choice without replacement

I'm new to Python. While reading, please mention any other suggestions regarding ways to improve my Python code.

Question: How do I generate a 8xN dimensional array in Python containing random numbers? The constraint is that each column of this array must contain 8 draws without replacement from the integer set [1,8]. More specifically, when N = 10, I want something like this.

[[ 6.  2.  3.  4.  7.  5.  5.  7.  8.  4.]
 [ 1.  4.  5.  5.  4.  4.  8.  5.  7.  5.]
 [ 7.  3.  8.  8.  3.  8.  7.  3.  6.  7.]
 [ 3.  6.  7.  1.  5.  6.  2.  1.  5.  1.]
 [ 8.  1.  4.  3.  8.  2.  3.  4.  3.  3.]
 [ 5.  8.  1.  7.  1.  3.  6.  8.  1.  6.]
 [ 4.  5.  2.  6.  2.  1.  1.  6.  4.  2.]
 [ 2.  7.  6.  2.  6.  7.  4.  2.  2.  8.]]

To do this I use the following approach:

import numpy.random
import numpy
def rand_M(N):
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 
    return M

In practice N will be ~1e7. The algorithm above is O(n) in time and it takes roughly .38 secs when N=1e3. The time therefore when N = 1e7 is ~1hr (i.e. 3800 secs). There has to be a much more efficient way.

Timing the function

from timeit import Timer 
t = Timer(lambda: rand_M(1000))
print(t.timeit(5))
0.3863314103162543
like image 388
Jacob H Avatar asked Aug 12 '15 03:08

Jacob H


2 Answers

Create a random array of specified shape and then sort along the axis where you want to keep the limits, thus giving us a vectorized and very efficient solution. This would be based on this smart answer to MATLAB randomly permuting columns differently. Here's the implementation -

Sample run -

In [122]: N = 10

In [123]: np.argsort(np.random.rand(8,N),axis=0)+1
Out[123]: 
array([[7, 3, 5, 1, 1, 5, 2, 4, 1, 4],
       [8, 4, 3, 2, 2, 8, 5, 5, 6, 2],
       [1, 2, 4, 6, 5, 4, 4, 3, 4, 7],
       [5, 6, 2, 5, 8, 2, 7, 8, 5, 8],
       [2, 8, 6, 3, 4, 7, 1, 1, 2, 6],
       [6, 7, 7, 8, 6, 6, 3, 2, 7, 3],
       [4, 1, 1, 4, 3, 3, 8, 6, 8, 1],
       [3, 5, 8, 7, 7, 1, 6, 7, 3, 5]], dtype=int64)

Runtime tests -

In [124]: def sortbased_rand8(N):
     ...:     return np.argsort(np.random.rand(8,N),axis=0)+1
     ...: 
     ...: def rand_M(N):
     ...:     M = np.zeros(shape = (8, N))
     ...:     for i in range (0, N):
     ...:         M[:, i] = np.random.choice(8, size = 8, replace = False) + 1 
     ...:     return M
     ...: 

In [125]: N = 5000

In [126]: %timeit sortbased_rand8(N)
100 loops, best of 3: 1.95 ms per loop

In [127]: %timeit rand_M(N)
1 loops, best of 3: 233 ms per loop

Thus, awaits a 120x speedup!

like image 141
Divakar Avatar answered Sep 23 '22 04:09

Divakar


How about shuffling, that is to say, permuting?

import random
import numpy
from timeit import Timer 

def B_rand_M(N):
    a = numpy.arange(1,9)
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.permutation(a)
    return M

# your original implementation
def J_rand_M(N):
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 
    return M

some timings:

def compare(N):
    for f in (J_rand_M, B_rand_M):
        t = Timer(lambda: f(N)).timeit(6)
        print 'time for %s(%s): %.6f' % (f.__name__, N, t)

for i in range(6):
    print 'N = 10^%s' % i
    compare(10**i)
    print

gives

N = 10^0
time for J_rand_M(1): 0.001199
time for B_rand_M(1): 0.000080

N = 10^1
time for J_rand_M(10): 0.001112
time for B_rand_M(10): 0.000335

N = 10^2
time for J_rand_M(100): 0.011118
time for B_rand_M(100): 0.003022

N = 10^3
time for J_rand_M(1000): 0.110887
time for B_rand_M(1000): 0.030528

N = 10^4
time for J_rand_M(10000): 1.100540
time for B_rand_M(10000): 0.304696

N = 10^5
time for J_rand_M(100000): 11.151576
time for B_rand_M(100000): 3.049474
like image 37
Béla Avatar answered Sep 25 '22 04:09

Béla