why numpy narray read from file consumes so much memory?

Question

the file contains 2000000 rows: each row contains 208 columns, separated by comma, like this:

0.0863314058048,0.0208767447842,0.03358010485,0.0,1.0,0.0,0.314285714286,0.336293217457,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0

The program read this file to a numpy narray, I expected it will consume about (2000000 * 208 * 8B) = 3.2GB memory. However, when the program read this file, I found the program consumes about 20GB memory.

I am confused about why my program consumes so much memory that do not meet expectation?

Saullo G. P. Castro · Accepted Answer

I'm using Numpy 1.9.0 and the memory inneficiency of np.loadtxt() and np.genfromtxt() seems to be directly related to the fact they are based on temporary lists to store the data:

see here for np.loadtxt()
and here for np.genfromtxt()

By knowing beforehand the shape of your array you can think of a file reader that will consume an amount of memory very close to the theoretical amount of memory (3.2 GB for this case), by storing the data using the corresponding dtype:

def read_large_txt(path, delimiter=None, dtype=None):
    with open(path) as f:
        nrows = sum(1 for line in f)
        f.seek(0)
        ncols = len(f.next().split(delimiter))
        out = np.empty((nrows, ncols), dtype=dtype)
        f.seek(0)
        for i, line in enumerate(f):
            out[i] = line.split(delimiter)
    return out

why numpy narray read from file consumes so much memory?

Tags:

python

arrays

file-io

numpy

祝方泽

1 Answers

Saullo G. P. Castro

Recent Activity

Donate For Us

why numpy narray read from file consumes so much memory?

Tags:

python

arrays

file-io

numpy

祝方泽

1 Answers

Saullo G. P. Castro

Related questions

Recent Activity

Donate For Us