Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the workaround options for python out of memory error?

I am reading a x,y,z point file (LAS) into python and have run into memory errors. I am interpolating unknown points between known points for a project I am working on. I began working with small files (< 5,000,000 points) and was able to read/write to a numpy array and python lists with no problem. I have received more data to work with (> 50,000,000 points) and now my code fails with a MemoryError.

What are some options for handling such large amounts of data? I do not have to load all data into memory at once, but I will need to look at neighboring points using scipy kd-tree I am using Python 2.7 32 bit on a 64 bit Windows XP OS.

Thanks in advance.

EDIT: Code is posted below. I took out code for long calculations and variable definitions.

from liblas import file
import numpy as np

f = file.File(las_file, mode='r')
num_points = int(f.__len__())
dt = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('i', 'u2'), ('c', 'u1'), ('t', 'datetime64[us]')]
xyzict = np.empty(shape=(num_points,), dtype = dt)
counter = 0
for p in f:
    newrow = (p.x, p.y, p.z, p.intensity, p.classification, p.time)
    xyzict[counter] = newrow    
    counter += 1

dropoutList = []
counter = 0
for i in np.nditer(xyzict):
    # code to define P1x, P1y, P1z, P1t
    if counter != 0:
        # code to calculate n, tDiff, and seconds 
        if n > 1 and n < scanN:
            # code to find v and vD
            for d in range(1, int(n-1)):
                # Code to interpolate x, y, z for points between P0 and P1
                # Append tuple of x, y, and z to dropoutList
                dropoutList.append(vD)
    # code to set x, y, z, t for next iteration
    counter += 1
like image 565
Barbarossa Avatar asked Mar 21 '23 16:03

Barbarossa


2 Answers

Regardless of the amount of RAM in your system, if you are running 32-bit python, you will have a practical limit of about 2 GB of RAM for your application. There are a number of other questions on SO that address this (e.g., see here). Since the structure you are using in your ndarray is 23 bytes and you are reading over 50,000,000 points, that already puts you at about 1 GB. You haven't included the rest of your code so it isn't clear how much additional memory is being consumed by other parts of your program.

If you have well over 2 GB of RAM in your system and you will continue to work on large data sets, you should install 64-bit python to get around this ~ 2 GB limit.

like image 148
bogatron Avatar answered Mar 29 '23 04:03

bogatron


Save the points in a binary file on disk and then use numpy.memmap That'll be a bit slower but might not hurt (depending on the algorithm).

Or try the 64 bit version of Python; you probably need more than 2GB of data.

Lastly, check your code how it works with the data. With that many elements, you shouldn't try to copy / clone the array. Use views instead.

If everything else fails, try a 64bit version of Linux (since you won't get a 64bit Windows for free).

like image 34
Aaron Digulla Avatar answered Mar 29 '23 04:03

Aaron Digulla