Suppose I have a directory with thousands of GRIB files. I want to load those files into a dask array so I can query them. How can I go about doing this? The attempt below seems to work, but it requires each GRIB file to be opened, and it takes a long time to run and all of my memory. There must be a better way.
My attempt:
import dask.array as da
from dask import delayed
import gdal
import glob
import os
def load(filedir):
files = sorted(glob.glob(os.path.join(filedir, '*.grb')))
data = [da.from_array(gdal.Open(f).ReadAsArray(), chunks=[500,500,500], name=f) for f in files]
return da.stack(data, axis=0)
file_dir = ...
array = load(file_dir)
The best way to do this would be to use dask.delayed
. In this case, you'd create a delayed function to read the array, and then compose a dask array from those delayed
objects using the da.from_delayed
function. Something along the lines of:
# This function isn't run until compute time
@dask.delayed(pure=True)
def load(file):
return gdal.Open(file).ReadAsArray()
# Create several delayed objects, then turn each into a dask
# array. Note that you need to know the shape and dtype of each
# file
data = [da.from_delayed(load(f), shape=shape_of_f, dtype=dtype_of_f)
for f in files]
x = da.stack(data, axis=0)
Note that this makes a single task for loading each file. If the individual files are large, you may want to chunk them yourself in the load
function. I'm not familiar with gdal, but from a brief look at the ReadAsArray
method this may be doable with the xoff
/yoff
/xsize
/ysize
parameters (not sure). You'd have to write this code yourself, but it may be more performant for large files.
Alternatively you could use the code above, and then call rechunk
to rechunk into smaller chunks. This would still result in reading in each file in a single task, but subsequent steps could work with smaller chunks. Whether this is worth it or not depends on the size of your individual files.
x = x.rechunk((500, 500, 500)) # or whatever chunks you want
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With