Python memory mapping

Tags:

I am working with big data and i have matrices with size like 2000x100000, so in order to to work faster i tried using the numpy.memmap to avoid storing in memory this large matrices due to the RAM limitations. The problem is that when i store the same matrix in 2 variables, i.e One with numpy.load and in the other with np.memmap, the contents are not the same. Is this normal? I am using the same data type in memmap as in my data. Example:

A1 = numpy.load('mydata.npy')
A2 = numpy.memmap('mydata.npy',dtype=numpy.float64, mode='r', shape=(2000,2000))
A1[0,0] = 0
A2[0,0] = 1.8758506894003703e-309

That's the contents of the first element of the array in both cases. The correct one is the value 0 but i am getting this weird number by using the memmap. Thank you.

374

asked Dec 10 '14 12:12

azal

1 Answers

The NPY format is not simply a dump of the array's data to a file. It includes a header that contains, among other things, the metadata that defines the array's data type and shape. When you use memmap directly like you have done, your memory map doesn't take into account the file's header where the metadata is stored. To create a memory mapped view of a NPY file, you can use the mmap_mode option of np.load.

Here's an example. First, create a NPY file:

In [1]: a = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

In [2]: np.save('a.npy', a)

Read it back in with np.load:

In [3]: a1 = np.load('a.npy')

In [4]: a1
Out[4]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Incorrectly view the file with memmap:

In [5]: a2 = np.memmap('a.npy', dtype=np.float64, mode='r', shape=(2, 3))

In [6]: a2
Out[6]: 
memmap([[  1.87585069e-309,   1.17119999e+171,   5.22741680e-037],
       [  8.44740097e+252,   2.65141232e+180,   9.92152605e+247]])

Create a memmap using np.load with the option mmap_mode='r':

In [7]: a3 = np.load('a.npy', mmap_mode='r')

In [8]: a3
Out[8]: 
memmap([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

147

answered Sep 27 '22 20:09

Warren Weckesser

Related questions
                            
                                Python Tkinter Canvas fail to bind keyboard
                            
                                Get dot-product of dataframe with vector, and return dataframe, in Pandas
                            
                                How can you make an adjacency matrix which would emulate a 2d grid
                            
                                Multiple logical comparisons on a single line for an if statement
                            
                                How to do a symbolic taylor expansion of an unknown function $f(x)$ using sympy
                            
                                Pandas read_csv dtype leading zeros
                            
                                python paramiko ssh session does not get the system path
                            
                                Attribute Error: next()
                            
                                Resolve GCC error when installing python-ldap on Redhat Enterprise Server
                            
                                TypeError: 'class' object is not callable
                            
                                Multiple data in scatter matrix
                            
                                How to skip providing default arguments in a Python method
                            
                                Python: Passing a list through a recursive function call causes the list to become 'NoneType', why?
                            
                                What are non-pure functions in python?
                            
                                python exception <type 'exceptions.ImportError'> No module named gdb:
                            
                                DataFrame.drop_duplicates and DataFrame.drop not removing rows
                            
                                How can I install lxml dependencies on Amazon EC2 linux?
                            
                                Custom name of field in Flask-Admin
                            
                                How to convert matrix to pandas data frame
                            
                                Making the labels of the scatterplot vertical and horizontal in Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python memory mapping

Tags:

python

numpy

azal

People also ask

1 Answers

Warren Weckesser

Recent Activity

Donate For Us