Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join/merge multiple NetCDF files using xarray

I have a folder with NetCDF files from 2006-2100, in ten year blocks (2011-2020, 2021-2030 etc).

I want to create a new NetCDF file which contains all of these files joined together. So far I have read in the files:

ds = xarray.open_dataset('Path/to/file/20062010.nc')
ds1 = xarray.open_dataset('Path/to/file/20112020.nc')
etc.

Then merged these like this:

dsmerged = xarray.merge([ds,ds1])

This works, but is clunky and there must be a simpler way to automate this process, as I will be doing this for many different folders full of files. Is there a more efficient way to do this?

EDIT:

Trying to join these files using glob:

for filename in glob.glob('path/to/file/.*nc'):
    dsmerged = xarray.merge([filename])

Gives the error:

AttributeError: 'str' object has no attribute 'items'

This is reading only the text of the filename, and not the actual file itself, so it can't merge it. How do I open, store as a variable, then merge without doing it bit by bit?

like image 480
Pad Avatar asked Nov 10 '17 15:11

Pad


People also ask

How do I combine netCDF files?

In order to concatenate CMEMS NetCDF file, you need to add a "record dimension" to the first . nc file and then concatenate files. (A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude etc.


1 Answers

If you are looking for a clean way to get all your datasets merged together, you can use some form of list comprehension and the xarray.merge function to get it done. The following is an illustration:

ds = xarray.merge([xarray.open_dataset(f) for f in glob.glob('path/to/file/.*nc')])

In response to the out of memory issues you encountered, that is probably because you have more files than the python process can handle. The best fix for that is to use the xarray.open_mfdataset function, which actually uses the library dask under the hood to break the data into smaller chunks to be processed. This is usually more memory efficient and will often allow you bring your data into python. With this function, you do not need a for-loop; you can just pass it a string glob in the form "path/to/my/files/*.nc". The following is equivalent to the previously provided solution, but more memory efficient:

ds = xarray.open_mfdataset('path/to/file/*.nc')

I hope this proves useful.

like image 111
Abdou Avatar answered Sep 28 '22 17:09

Abdou