Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much memory in numpy array? Is RAM a limiting factor?

I'm using numpy to create a cube array with sides of length 100, thus containing 1 million entries total. For each of the million entries, I am inserting a 100x100 matrix whose entries are comprised of randomly generated numbers. I am using the following code to do so:

import random
from numpy import *

cube = arange(1000000).reshape(100,100,100)

for element in cube.flat:
    matrix = arange(10000).reshape(100,100)
    for entry in matrix.flat:
        entry = random.random()*100
    element = matrix

I was expecting this to take a while, but with 10 billion random numbers being generated, I'm not sure my computer can even handle it. How much memory would such an array take up? Would RAM be a limiting factor, i.e. if my computer doesn't have enough RAM, could it fail to actually generate the array?

Also, if there is a more efficient to implement this code, I would appreciate tips :)

like image 254
aensm Avatar asked Jun 28 '12 21:06

aensm


People also ask

How much memory does a NumPy array use?

The size in memory of numpy arrays is easy to calculate. It's simply the number of elements times the data size, plus a small constant overhead. For example, if your cube. dtype is int64 , and it has 1,000,000 elements, it will require 1000000 * 64 / 8 = 8,000,000 bytes (8Mb).

Is there a limit to NumPy array size?

There is no general maximum array size in numpy.

Are NumPy arrays memory efficient?

NumPy uses much less memory to store dataThe NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

Does NumPy use less memory?

Less memory usage: The Python NumPy array consumes less memory than lists. Less execution time: The NumPy array is pretty fast in terms of execution, as compared to lists in Python.


2 Answers

for the "inner" part of your function, look at the numpy.random module

import numpy as np
matrix = np.random.random((100,100))*100
like image 20
Phil Cooper Avatar answered Oct 02 '22 23:10

Phil Cooper


A couple points:

  • The size in memory of numpy arrays is easy to calculate. It's simply the number of elements times the data size, plus a small constant overhead. For example, if your cube.dtype is int64, and it has 1,000,000 elements, it will require 1000000 * 64 / 8 = 8,000,000 bytes (8Mb).
  • However, as @Gabe notes, 100 * 100 * 1,000,000 doubles will require about 80 Gb.
  • This will not cause anything to "break", per-se, but operations will be ridiculously slow because of all the swapping your computer will need to do.
  • Your loops will not do what you expect. Instead of replacing the element in cube, element = matrix will simply overwrite the element variable, leaving the cube unchanged. The same goes for the entry = random.rand() * 100.
  • Instead, see: http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
like image 118
David Wolever Avatar answered Oct 02 '22 22:10

David Wolever