Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grid a large xyz file with missing records without running out of memory

Tags:

python

numpy

grid

I have xyz textfiles that need to be gridded. For each xyz file i have info about the origin coordinates the cellsize and the number of rows/columns. however, records where there´s no z value are missing in the xyz file so just creating a grid from the present records fails because of the missing values. so i tried this:

nxyz = np.loadtxt(infile,delimiter=",",skiprows=1)

ncols = 4781
nrows = 4405
xllcorner = 682373.533843
yllcorner = 205266.898604
cellsize = 1.25

grid = np.zeros((nrows,ncols))

for item in nxyz:
    idx = (item[0]-xllcorner)/cellsize
    idy = (item[1]-yllcorner)/cellsize
    grid[idy,idx] = item[2]

outfile = open(r"e:\test\myrasout.txt","w")
np.savetxt(outfile,grid[::-1], fmt="%.2f",delimiter= " ")
outfile.close()

This gets me the grid with zeroes where no records are present in the xyz file. It works for smaller files but i got an out of memory error for a file with 290Mb size (~8900000 records). And this is not the largest file i have to process.

So i tried another (iterative) approach by Joe Kington i found here for loading the xyz file. This worked for the 290MB file, but failed with an out of memory error on the next bigger one (533MB, ~15600000 records).

How can i grid these larger files correctly (accounting for the missing records) without running out of memory?

like image 680
rr5577 Avatar asked Dec 06 '25 01:12

rr5577


2 Answers

Based on the comments I'd change the code to

ncols = 4781
nrows = 4405
xllcorner = 682373.533843
yllcorner = 205266.898604
cellsize = 1.25
grid = np.zeros((nrows,ncols))

with open(file) as f:
    for line in f:
        item = line.split() # fill with whatever is separating the values 
        idx = (item[0]-xllcorner)/cellsize
        idy = (item[1]-yllcorner)/cellsize
        #...
like image 160
LarsVegas Avatar answered Dec 08 '25 14:12

LarsVegas


You can do fancy indexing with NumPy. Try using something like this, instead of the loop which is probably the root of yuor problem:

grid = np.zeros((nrows,ncols))
grid[nxyz[:,0],nxyz[:,1]] = nxyz[:,2]

With the origin and cell size conversion, it is a bit more involved:

grid = np.zeros((nrows,ncols))
grid[(nxyz[:,0]-x11corner)/cellsize,(nxyz[:,1]-y11corner)/cellsize] = nxyz[:,2]

If this doesn't help, the nxyz array is too big, but I doubt that. If it is, then you could load the text file in several parts and do the above for each part sequentially.

P.S. You probably know the range of the data contained in your text files, and you can limit memory usage by explicitely stating this while loading the file. Like so if you are dealing with maximally 16 bit integers: np.loadtxt("myfile.txt", dtype=int16).

like image 40
Karol Avatar answered Dec 08 '25 16:12

Karol



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!