I am reading a x,y,z point file (LAS) into python and have run into memory errors. I am interpolating unknown points between known points for a project I am working on. I began working with small files (< 5,000,000 points) and was able to read/write to a numpy array and python lists with no problem. I have received more data to work with (> 50,000,000 points) and now my code fails with a MemoryError.
What are some options for handling such large amounts of data? I do not have to load all data into memory at once, but I will need to look at neighboring points using scipy kd-tree I am using Python 2.7 32 bit on a 64 bit Windows XP OS.
Thanks in advance.
EDIT: Code is posted below. I took out code for long calculations and variable definitions.
from liblas import file
import numpy as np
f = file.File(las_file, mode='r')
num_points = int(f.__len__())
dt = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('i', 'u2'), ('c', 'u1'), ('t', 'datetime64[us]')]
xyzict = np.empty(shape=(num_points,), dtype = dt)
counter = 0
for p in f:
newrow = (p.x, p.y, p.z, p.intensity, p.classification, p.time)
xyzict[counter] = newrow
counter += 1
dropoutList = []
counter = 0
for i in np.nditer(xyzict):
# code to define P1x, P1y, P1z, P1t
if counter != 0:
# code to calculate n, tDiff, and seconds
if n > 1 and n < scanN:
# code to find v and vD
for d in range(1, int(n-1)):
# Code to interpolate x, y, z for points between P0 and P1
# Append tuple of x, y, and z to dropoutList
dropoutList.append(vD)
# code to set x, y, z, t for next iteration
counter += 1
Regardless of the amount of RAM in your system, if you are running 32-bit python, you will have a practical limit of about 2 GB of RAM for your application. There are a number of other questions on SO that address this (e.g., see here). Since the structure you are using in your ndarray is 23 bytes and you are reading over 50,000,000 points, that already puts you at about 1 GB. You haven't included the rest of your code so it isn't clear how much additional memory is being consumed by other parts of your program.
If you have well over 2 GB of RAM in your system and you will continue to work on large data sets, you should install 64-bit python to get around this ~ 2 GB limit.
Save the points in a binary file on disk and then use numpy.memmap That'll be a bit slower but might not hurt (depending on the algorithm).
Or try the 64 bit version of Python; you probably need more than 2GB of data.
Lastly, check your code how it works with the data. With that many elements, you shouldn't try to copy / clone the array. Use views instead.
If everything else fails, try a 64bit version of Linux (since you won't get a 64bit Windows for free).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With