Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combining spatial netcdf files using xarray python

Is there a way to merge 2 or more netCDF files with the same time dimension but different spatial domains into a single netCDF file? The spatial domains are specified by latitude and longitude coordinates? In the documentation for xarray concat, merge etc., they say that they work with a single dimension

like image 794
user308827 Avatar asked Jul 23 '18 21:07

user308827


2 Answers

xarray now supports multi-dimensional concatenation directly through open_mfdataset.

The documentation on combining data along multiple dimensions is here, but as your question is very similar to this one, I'm going to copy the key parts of my answer here:


You have a 2D concatenation problem: you need to arrange the datasets such that when joined up along x and y, they make a larger dataset which also has dimensions x and y.

As long as len(x) is the same in every file, and len(y) is the same in every file, you should in theory be able to do this in one or two different ways.

1) Using combine='nested'

You can manually specify the order that you need them joined up in. xarray allows you to do this by passing the datasets as a grid, specified as a nested list. In your case, if we had 4 files (named [upper_left, upper_right, lower_left, lower_right]), we would combine them like so:

from xarray import open_mfdataset

grid = [[upper_left, upper_right], 
        [lower_left, lower_right]]

ds = open_mfdataset(grid, concat_dim=['x', 'y'], combine='nested')

We had to tell open_mfdataset which dimensions of the data the rows and colums of the grid corresponded to, so it would know which dimensions to concatenate the data along. That's why we needed to pass concat_dim=['x', 'y'].

2) Using combine='by_coords'

But your data has coordinates in it already - can't xarray just use those to arrange the datasets in the right order? That is what the combine='by_coords' option is for, but unfortunately, it requires 1-dimensional coordinates (also known as dimensional coordinates) to arrange the data. If your files don't have any of those the printout will says Dimensions without coordinates: x, y).

If you can add 1-dimensional coordinates to your files first, then you could use combine='by_coords', then you could just pass a list of all the files in any order, i.e.

ds = open_mfdataset([file1, file2, ...], combine='by_coords')

But otherwise you'll have to use combine='nested'.

like image 147
ThomasNicholas Avatar answered Oct 23 '22 03:10

ThomasNicholas


I don't know an "automated" way to do this in python (or R, FORTRAN), only manually reading in the files to a larger array and then writing out that array to a new netcdf file, but there is a more "automated" to do it from the command line using CDO.

If you define a domain description file grid.txt that contains the two (or more) files regions:

gridtype = lonlat
gridsize = 420
xname = lon
xlongname = longitude
xunits = degrees east
yname = lat
ylongname = latitude
yunits = degrees north
xsize = 21
ysize = 20
xfirst = -11.0
xinc = 1
yfirst = -20.0
yinc = 1

and then you "expand" the first file file1.nc to the larger domain and then merge in the contents of both netcdf files:

cdo expand,grid.txt file1.nc large.nc
cdo mergegrid large.nc file1.nc merge1.nc
cdo mergegrid merge1.nc file2.nc final_merge.nc 

I found this soln here: https://code.mpimet.mpg.de/boards/1/topics/26 and have used it when I need to merge 2 or 3 files together. However when I needed to merge many hundred of files together containing e.g. one latitude row of data each, I wrote a manual programme (in R in my case).

like image 1
Adrian Tompkins Avatar answered Oct 23 '22 03:10

Adrian Tompkins