Correct way to do operations on Memmapped arrays

Tags:

Would it be more memory efficient to separately calculate np.argsort as a temporary memmapped array and np.argsort(np.argsort) as a temporary memmapped array and then do the operation? Since the argsort array of a 20GB array would itself be pretty huge!

I think these questions will help me get clarified about the inner workings of memmapped arrays in python!

Thanks...

972

asked Aug 29 '14 11:08

user1265125

1 Answers

I'm going to try to answer part 2 first then 1 and 3.

First, arr = <something> is simple variable assignment, whereas arr[:] = <something> assigns to the contents of the array. In the code below, after arr[:] = x, arr still is a memmapped array, whereas after arr = x, arr is a ndarray.

>>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000))
>>> type(arr)
<class 'numpy.core.memmap.memmap'>
>>> x = np.ones((1,10000000))
>>> type(x)
<class 'numpy.ndarray'>
>>> arr[:] = x
>>> type(arr)
<class 'numpy.core.memmap.memmap'>
>>> arr = x
>>> type(arr)
<class 'numpy.ndarray'>

In the case of np.argsort, it returns an array of the same type of its argument. So in this specific case, I'd think there should be no difference between doing arr = np.argsort(x) or arr[:] = np.argsort(x). In your code, arr2 will be a memmapped array. But there is a difference.

>>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000))
>>> x = np.ones((1,10000000))
>>> arr[:] = x
>>> type(np.argsort(x))
<class 'numpy.ndarray'>
>>> type(np.argsort(arr))
<class 'numpy.core.memmap.memmap'>

OK, now what is different. Using arr[:] = np.argsort(arr), if we look at changes to the memmapped file, we see that every change to arr is followed by a change in the file's md5sum.

>>> import os
>>> import numpy as np
>>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000))
>>> arr[:] = np.zeros((1,10000000))
>>> os.system("md5sum mm")
48e9a108a3ec623652e7988af2f88867  mm
0
>>> arr += 1.1
>>> os.system("md5sum mm")
b8efebf72a02f9c0b93c0bbcafaf8cb1  mm
0
>>> arr[:] = np.argsort(arr)
>>> os.system("md5sum mm")
c3607e7de30240f3e0385b59491ac2ce  mm
0
>>> arr += 1.3
>>> os.system("md5sum mm")
1e6af2af114c70790224abe0e0e5f3f0  mm
0

We see that arr still retains its _mmap attribute.

>>> arr._mmap
<mmap.mmap object at 0x7f8e0f086198>

Now using arr = np.argsort(x), we see that the md5sums stop changing. Even though arr's type is memmapped array, it's a new object and it seems the memory mapping is dropped.

>>> import os
>>> import numpy as np
>>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000))
>>> arr[:] = np.zeros((1,10000000))
>>> os.system("md5sum mm")
48e9a108a3ec623652e7988af2f88867  mm
0
>>> arr += 1.1
>>> os.system("md5sum mm")
b8efebf72a02f9c0b93c0bbcafaf8cb1  mm
0
>>> arr = np.argsort(arr)
>>> os.system("md5sum mm")
b8efebf72a02f9c0b93c0bbcafaf8cb1  mm
0
>>> arr += 1.3
>>> os.system("md5sum mm")
b8efebf72a02f9c0b93c0bbcafaf8cb1  mm
0
>>> type(arr)
<class 'numpy.core.memmap.memmap'>

Now the '_mmap' attribute is None.

>>> arr._mmap
>>> type(arr._mmap)
<class 'NoneType'>

Now part 3. It seems pretty easy to lose reference to the memmapped object when doing complex operations. My current understanding is that you'd have to break things down and use arr[:] = <> for intermediate results.

Using numpy 1.8.1 and Python 3.4.1

176

answered Sep 16 '22 11:09

A.P.

Related questions
                            
                                How to force the race condition in transaction in django tests?
                            
                                Python OpenCV PCACompute Eigenvalue
                            
                                From python, can we track module-level assignments before (other) user code executes?
                            
                                Python3 venv: Can env directory be renamed?
                            
                                Python threads and atomic operations
                            
                                Tor doesn't work with urllib2
                            
                                Django rest change users password view
                            
                                Loopback ('What u hear') recording in Python using PyAudio
                            
                                Django rest framework versioning
                            
                                Django 1.6 transactions to avoid race conditions
                            
                                Algorithm for generating a tree decomposition
                            
                                How does paramiko Channel.recv() exactly work?
                            
                                How to install NodeBox for console
                            
                                Pandas to_json changing data type
                            
                                Difference between `data` and `files` in Python requests
                            
                                Scrapy: how to catch download error and try download it again
                            
                                Pip install ignores files in MANIFEST.in - how to structure the project correctly?
                            
                                Python: Stop thread that is waiting for user input
                            
                                python-pyramid app memory is not releasing at all
                            
                                SocketIO emit from Asynchronous Celery worker is not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Correct way to do operations on Memmapped arrays

Tags:

python

numpy

user1265125

People also ask

1 Answers

A.P.

Recent Activity

Donate For Us