Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python memory mapping

Tags:

python

numpy

I am working with big data and i have matrices with size like 2000x100000, so in order to to work faster i tried using the numpy.memmap to avoid storing in memory this large matrices due to the RAM limitations. The problem is that when i store the same matrix in 2 variables, i.e One with numpy.load and in the other with np.memmap, the contents are not the same. Is this normal? I am using the same data type in memmap as in my data. Example:

A1 = numpy.load('mydata.npy')
A2 = numpy.memmap('mydata.npy',dtype=numpy.float64, mode='r', shape=(2000,2000))
A1[0,0] = 0
A2[0,0] = 1.8758506894003703e-309

That's the contents of the first element of the array in both cases. The correct one is the value 0 but i am getting this weird number by using the memmap. Thank you.

like image 374
azal Avatar asked Dec 10 '14 12:12

azal


People also ask

What is memory mapping in Python?

Memory mapping is an alternative approach to file I/O that's available to Python programs through the mmap module. Memory mapping uses lower-level operating system APIs to store file contents directly in physical memory.

What is a memory mapping?

What Is Memory-Mapping? Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of addresses within an application's address space. The application can then access files on disk in the same way it accesses dynamic memory.

How do I map a file in Python?

If you wish to map an existing Python file object, use its fileno() method to obtain the correct value for the fileno parameter. Otherwise, you can open the file using the os. open() function, which returns a file descriptor directly (the file still needs to be closed when done).

Which are the methods of memory mapping?

There are three different types of mapping used for the purpose of cache memory which are as follows: Direct mapping, Associative mapping, and Set-Associative mapping.


1 Answers

The NPY format is not simply a dump of the array's data to a file. It includes a header that contains, among other things, the metadata that defines the array's data type and shape. When you use memmap directly like you have done, your memory map doesn't take into account the file's header where the metadata is stored. To create a memory mapped view of a NPY file, you can use the mmap_mode option of np.load.

Here's an example. First, create a NPY file:

In [1]: a = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

In [2]: np.save('a.npy', a)

Read it back in with np.load:

In [3]: a1 = np.load('a.npy')

In [4]: a1
Out[4]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Incorrectly view the file with memmap:

In [5]: a2 = np.memmap('a.npy', dtype=np.float64, mode='r', shape=(2, 3))

In [6]: a2
Out[6]: 
memmap([[  1.87585069e-309,   1.17119999e+171,   5.22741680e-037],
       [  8.44740097e+252,   2.65141232e+180,   9.92152605e+247]])

Create a memmap using np.load with the option mmap_mode='r':

In [7]: a3 = np.load('a.npy', mmap_mode='r')

In [8]: a3
Out[8]: 
memmap([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])
like image 147
Warren Weckesser Avatar answered Sep 27 '22 20:09

Warren Weckesser