I'm trying to better understand how numpy's memmap handles views of very large files. The script below opens a memory mapped 2048^3 array, and copies a downsampled 128^3 view of it
import numpy as np
from time import time
FILE = '/Volumes/BlackBox/test.dat'
array = np.memmap(FILE, mode='r', shape=(2048,2048,2048), dtype=np.float64)
t = time()
for i in range(5):
view = np.array(array[::16, ::16, ::16])
t = ((time() - t) / 5) * 1000
print "Time (ms): %i" % t
Usually, this prints Time (ms): 80
or so. However, if I change the view assignment to
view = np.array(array[1::16, 2::16, 3::16])
and run it three times, I get the following:
Time (ms): 9988
Time (ms): 79
Time (ms): 78
Does anybody understand why the first invocation is so much slower?
memmap() function. The memmap() function is used to create a memory-map to an array stored in a binary file on disk. Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory.
Memory-mapped file objects behave like both bytearray and like file objects. You can use mmap objects in most places where bytearray are expected; for example, you can use the re module to search through a memory-mapped file.
You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma.
As far as I understand, there are currently two ways to close a memmap "file"; del fp or fp. _mmap. close() . However, the former only closes the file if fp is the only reference to the memmap and the latter crashes the python interpreter if there exists another reference to the memmap.
The OS still has portions (or all) of the mapped file available cached in physical RAM. The initial read has to access the disk, which is a lot slower than accessing RAM. Do enough other disk IO, and you'll find that you'll get back closer to your original time, where the OS has to re-read bits it hasn't cached from disk again...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With