Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this giant (non-sparse) numpy matrix fit in RAM

I am very confused by what is reported by numpy.ndarray.nbytes.

I just created an identity matrix of size 1 million (10^6), which therefore has 1 trillion rows (10^12). Numpy reports that this array is 7.28TB, yet the python process only uses 3.98GB of memory, as reported by OSX activity monitor.

  • Is the whole array contained in memory?
  • Does Numpy somehow compress its representation, or is that handled by the OS?
  • If I simply calculate y = 2 * x, which should be the same size as x, the process memory increases to about 30GB, until it gets killed by the OS. Why, and what kind of operations can I conduct on x without the memory usage expanding so much?

This is the code I used:

import numpy as np
x = np.identity(1e6)
x.size
# 1000000000000
x.nbytes / 1024 ** 4
# 7.275957614183426
y = 2 * x
# python console exits and terminal shows: Killed: 9
like image 307
Rems Avatar asked Feb 13 '16 16:02

Rems


2 Answers

On Linux (and I'm assuming the same thing happens under Mac), when a program allocates memory, the OS doesn't actually allocate it until it uses it.

If the program never uses the memory, then the OS doesn't have to waste RAM on it, but it does put the OS in a spot when the program has requested a ton of memory and actually needs to use it, but the OS doesn't have enough.

When that happens, the OS may either start killing off minor other processes and give their RAM to the requesting process, or just kill off the requesting process (which is what is happening now).

The initial 4GB of memory that Python uses is likely the pages where numpy set the 1 on the identity matrix; the rest of the pages haven't been used yet. Doing a math operation like 2*x starts accessing (and thus alloocating) all the pages until the OS runs out of memory and kills your process.

like image 88
Colonel Thirty Two Avatar answered Oct 03 '22 18:10

Colonel Thirty Two


The system allocates memory only virtual, only the first time your write to this memory, it gets actually physically used. For your example, you allocate 1 trillion numbers which corresponds to 2 billion memory pages, but only 1 million (1e6) of these pages are used to write the ones on the diagonal. That are exactly the 4GB of memory you see.

like image 23
Daniel Avatar answered Oct 03 '22 19:10

Daniel