Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join data from multiple netCDF files with xarray in Python?

I'm trying to open multiple netCDF files with xarray in Python. The files have data with same shape and I want to join them, creating a new dimension.

I tried to use concat_dim argument for xarray.open_mfdataset(), but it doesn't work as expected. An example is given below, which open two files with temperature data for 124 times, 241 latitudes and 480 longitudes:

DS = xr.open_mfdataset( 'eraINTERIM_t2m_*.nc', concat_dim='cases' )
da_t2m = DS.t2m

print( da_t2m )

With this code, I expect that the result data array will have a shape like (cases: 2, time: 124, latitude: 241, longitude: 480). However, its shape was (cases: 2, time: 248, latitude: 241, longitude: 480). It creates a new dimension, but also sums the leftmost dimension: 'time' dimension of two datasets. I was wondering whether it's an error from 'xarray.open_mfdateset' or it's an expected behavior because 'time' dimension is UNLIMITED for both datasets.

Is there a way to join data from these files directly using xarray and get the above expected return?

Thank you.

Mateus

like image 336
Mateus da Silva Teixeira Avatar asked Apr 01 '19 14:04

Mateus da Silva Teixeira


People also ask

How do I combine NetCDF files?

In order to concatenate CMEMS NetCDF file, you need to add a "record dimension" to the first . nc file and then concatenate files. (A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude etc.

How do I add a variable in NetCDF?

Create a new two-dimensional variable named peaks in a classic (NetCDF 3) format file named myncclassic.nc . Use the 'Dimensions' name-value pair argument to specify the names and lengths of the two dimensions. Use the 'Format' name-value pair argument to specify the file format. Write data to the variable.

How do I create a NetCDF file in Python?

Load libraries and create empty netCDF file In python, load the required libraries using the import statement. Assign the directory of the extracted data to the data_path variable. Use the netCDF4. Dataset function to create an empty netCDF file.


Video Answer


1 Answers

Extending from my comment I would try this:

def preproc(ds):
    ds = ds.assign({'stime': (['time'], ds.time)}).drop('time').rename({'time': 'ntime'})
    # we might need to tweak this a bit further, depending on the actual data layout
    return ds

DS = xr.open_mfdataset( 'eraINTERIM_t2m_*.nc', concat_dim='cases', preprocess=preproc)

The good thing here is, that you keep the original time coordinate in stime while renaming the original dimension (time -> ntime).

If everything works well, you should get resulting dimensions as (cases, ntime, latitude, longitude).

Disclaimer: I do similar in a loop with a final concat (wich works very well), but did not test the preprocess-approach.

like image 154
kmuehlbauer Avatar answered Oct 19 '22 05:10

kmuehlbauer