Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.memmap from numpy operations

I am working with rather large arrays created from large image files. I was having issues with using too much memory and decided to try using numpy.memmap arrays instead of the standard numpy.array. I was able to create a memmap and load the data into it from my image file in chunks, but I'm not sure how to load the result of an operation into a memmap.

For example, my image files are read into numpy as binary integer arrays. I have written a function that buffers (expands) any region of True cells by a specified number of cells. This function converts the input array to Boolean using array.astype(bool). How would I make the new Boolean array created by array.astype(bool) a numpy.memmap array?

Also, if there is a True cell closer to the edge of the input array than the specified buffer distance, the function will add rows and/or columns to the edge of the array to allow for a complete buffer around the existing True cell. This changes the shape of the array. Is it possible to change the shape of a numpy.memmap?

Here is my code:

def getArray(dataset):
    '''Dataset is an instance of the GDALDataset class from the
    GDAL library for working with geospatial datasets

    '''
    chunks = readRaster.GetArrayParams(dataset, chunkSize=5000)
    datPath = re.sub(r'\.\w+$', '_temp.dat', dataset.GetDescription())
    pathExists = path.exists(datPath)
    arr = np.memmap(datPath, dtype=int, mode='r+',
                    shape=(dataset.RasterYSize, dataset.RasterXSize))
    if not pathExists:
        for chunk in chunks:
            xOff, yOff, xWidth, yWidth = chunk
            chunkArr = readRaster.GetArray(dataset, *chunk)
            arr[yOff:yOff + yWidth, xOff:xOff + xWidth] = chunkArr
    return arr

def Buffer(arr, dist, ring=False, full=True):
    '''Applies a buffer to any non-zero raster cells'''
    arr = arr.astype(bool)
    nzY, nzX = np.nonzero(arr)
    minY = np.amin(nzY)
    maxY = np.amax(nzY)
    minX = np.amin(nzX)
    maxX = np.amax(nzX)
    if minY - dist < 0:
        arr = np.vstack((np.zeros((abs(minY - dist), arr.shape[1]), bool),
                         arr))
    if maxY + dist >= arr.shape[0]:
        arr = np.vstack((arr,
                         np.zeros(((maxY + dist - arr.shape[0] + 1), arr.shape[1]), bool)))
    if minX - dist < 0:
        arr = np.hstack((np.zeros((arr.shape[0], abs(minX - dist)), bool),
                         arr))
    if maxX + dist >= arr.shape[1]:
        arr = np.hstack((arr,
                         np.zeros((arr.shape[0], (maxX + dist - arr.shape[1] + 1)), bool)))
    if dist >= 0: buffOp = binary_dilation
    else: buffOp = binary_erosion
    bufDist = abs(dist) * 2 + 1
    k = np.ones((bufDist, bufDist))
    bufArr = buffOp(arr, k)
    return bufArr.astype(int)
like image 655
Brian Avatar asked Sep 30 '13 15:09

Brian


1 Answers

Let me try to answer the first part of your question. Loading a result into a memmap datastore.

Note I am going to assume that there is a memmap file already on disk - it will be the input file. Called MemmapInput, created as follows:

fpInput = np.memmap('MemmapInput', dtype='bool', mode='w+', shape=(3,4))
del fpInput
fpOutput = np.memmap('MemmapOutput', dtype='bool', mode='w+', shape=(3,4))
del fpOutput

In your case the output file might not be present, but per the docs: 'r+' Open existing file for reading and writing.

‘w+’ Create or overwrite existing file for reading and writing.

So the first time you create a memmap file it must be with a 'w+', thereafter to modify/overwrite the file, use 'r+', Readonly copies can be obtained with 'r'. See http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html for more info.

Now we will read in this file and perform some operations on it. The main point is to load a result into a memamp file, the memmap file must first be created and attached to a file.

fpInput = np.memmap('MemmapInput', dtype='bool', mode='r', shape=(3,4))
fpOutput = np.memmap('MemmapOutput', dtype='bool', mode='r+', shape=(3,4))

Do whatever you want with the fpOutput memmap file e.g.:

i,j = numpy.nonzero(fpInput==True)
for indexI in i:
  for indexJ in j:
    fpOutput[indexI-1,indexJ] = True
    fpOutput[indexI, indexJ-1] = True
    fpOutput[indexI+1, indexJ] = True
    fpOutput[indexI, indexJ+1] = True
like image 107
Paul Avatar answered Oct 08 '22 18:10

Paul