I am trying to create a huge boolean
matrix which is randomly filled with True
and False
with a given probability p
. At first I used this code:
N = 30000 p = 0.1 np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])
But sadly it does not seem to terminate for this big N
. So I tried to split it up into the generation of the single rows by doing this:
N = 30000 p = 0.1 mask = np.empty((N, N)) for i in range (N): mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p]) if (i % 100 == 0): print(i)
Now, there happens something strange (at least on my device): The first ~1100 rows are very fastly generated - but after it, the code becomes horribly slow. Why is this happening? What do I miss here? Are there better ways to create a big matrix which has True
entries with probability p
and False
entries with probability 1-p
?
Edit: As many of you assumed that the RAM will be a problem: As the device which will run the code has almost 500GB RAM, this won't be a problem.
A boolean array can be created manually by using dtype=bool when creating the array. Values other than 0 , None , False or empty strings are considered True. Alternatively, numpy automatically creates a boolean array when comparisons are made between arrays and scalars or between arrays of the same shape.
To create a matrix of random integers in Python, randint() function of the numpy module is used. This function is used for random sampling i.e. all the numbers generated will be at random and cannot be predicted at hand. Parameters : low : [int] Lowest (signed) integer to be drawn from the distribution.
Python Random randint() Method The randint() method returns an integer number selected element from the specified range. Note: This method is an alias for randrange(start, stop+1) .
The problem is your RAM, the values are being stored in memory as it's being created. I just created this matrix using this command:
np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])
I used an AWS i3
instance with 64GB of RAM and 8 cores. To create this matrix, htop
shows that it takes up ~20GB of RAM. Here is a benchmark in case you care:
time np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p]) CPU times: user 18.3 s, sys: 3.4 s, total: 21.7 s Wall time: 21.7 s def mask_method(N, p): for i in range(N): mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p]) if (i % 100 == 0): print(i) time mask_method(N,p) CPU times: user 20.9 s, sys: 1.55 s, total: 22.5 s Wall time: 22.5 s
Note that the mask method only takes up ~9GB of RAM at it's peak.
Edit: The first method flushes the RAM after the process is done where as the function method retains all of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With