Create large random boolean matrix with numpy

Tags:

I am trying to create a huge boolean matrix which is randomly filled with True and False with a given probability p. At first I used this code:

N = 30000 p = 0.1 np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

But sadly it does not seem to terminate for this big N. So I tried to split it up into the generation of the single rows by doing this:

N = 30000 p = 0.1 mask = np.empty((N, N)) for i in range (N):      mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])                  if (i % 100 == 0):           print(i)

Now, there happens something strange (at least on my device): The first ~1100 rows are very fastly generated - but after it, the code becomes horribly slow. Why is this happening? What do I miss here? Are there better ways to create a big matrix which has True entries with probability p and False entries with probability 1-p?

Edit: As many of you assumed that the RAM will be a problem: As the device which will run the code has almost 500GB RAM, this won't be a problem.

944

asked Apr 20 '17 19:04

zimmerrol

1 Answers

The problem is your RAM, the values are being stored in memory as it's being created. I just created this matrix using this command:

np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

I used an AWS i3 instance with 64GB of RAM and 8 cores. To create this matrix, htop shows that it takes up ~20GB of RAM. Here is a benchmark in case you care:

time np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])  CPU times: user 18.3 s, sys: 3.4 s, total: 21.7 s Wall time: 21.7 s    def mask_method(N, p):     for i in range(N):         mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])         if (i % 100 == 0):             print(i)  time mask_method(N,p)  CPU times: user 20.9 s, sys: 1.55 s, total: 22.5 s Wall time: 22.5 s

Note that the mask method only takes up ~9GB of RAM at it's peak.

Edit: The first method flushes the RAM after the process is done where as the function method retains all of it.

140

answered Sep 19 '22 21:09

gold_cy

Related questions
                            
                                blocks - send input to python subprocess pipeline
                            
                                How best to parse a simple grammar?
                            
                                How to get rid of double backslash in python windows file path string? [duplicate]
                            
                                Python logging.DEBUG level doesn't logging
                            
                                ImportError: DLL load failed: %1 is not a valid Win32 application
                            
                                How can I rotate a matplotlib plot through 90 degrees?
                            
                                OS X - Deciding between anaconda and homebrew Python environments
                            
                                Anaconda: Install specific packages from specific channels using environment.yml
                            
                                Downsample array in Python
                            
                                Python requests.exception.ConnectionError: connection aborted "BadStatusLine"
                            
                                PIP Constraints Files
                            
                                How to run cloned Django project?
                            
                                Get list of Cache Keys in Django
                            
                                NumPy and SciPy - Difference between .todense() and .toarray()
                            
                                How to run a single line or selected code in a Jupyter Notebook or JupyterLab cell?
                            
                                Using absolute unix paths in windows with python
                            
                                Why isn't SQLAlchemy default column value available before object is committed?
                            
                                How to convert ndarray to array?
                            
                                functools.partial wants to use a positional argument as a keyword argument
                            
                                Python Asynchronous Comprehensions - how do they work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create large random boolean matrix with numpy

Tags:

python

random

numpy

zimmerrol

People also ask

1 Answers

gold_cy

Recent Activity

Donate For Us