I'm trying to open multiple netCDF files with xarray in Python. The files have data with same shape and I want to join them, creating a new dimension.
I tried to use concat_dim argument for xarray.open_mfdataset(), but it doesn't work as expected. An example is given below, which open two files with temperature data for 124 times, 241 latitudes and 480 longitudes:
DS = xr.open_mfdataset( 'eraINTERIM_t2m_*.nc', concat_dim='cases' )
da_t2m = DS.t2m
print( da_t2m )
With this code, I expect that the result data array will have a shape like (cases: 2, time: 124, latitude: 241, longitude: 480). However, its shape was (cases: 2, time: 248, latitude: 241, longitude: 480). It creates a new dimension, but also sums the leftmost dimension: 'time' dimension of two datasets. I was wondering whether it's an error from 'xarray.open_mfdateset' or it's an expected behavior because 'time' dimension is UNLIMITED for both datasets.
Is there a way to join data from these files directly using xarray and get the above expected return?
Thank you.
Mateus
In order to concatenate CMEMS NetCDF file, you need to add a "record dimension" to the first . nc file and then concatenate files. (A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude etc.
Create a new two-dimensional variable named peaks in a classic (NetCDF 3) format file named myncclassic.nc . Use the 'Dimensions' name-value pair argument to specify the names and lengths of the two dimensions. Use the 'Format' name-value pair argument to specify the file format. Write data to the variable.
Load libraries and create empty netCDF file In python, load the required libraries using the import statement. Assign the directory of the extracted data to the data_path variable. Use the netCDF4. Dataset function to create an empty netCDF file.
Extending from my comment I would try this:
def preproc(ds):
ds = ds.assign({'stime': (['time'], ds.time)}).drop('time').rename({'time': 'ntime'})
# we might need to tweak this a bit further, depending on the actual data layout
return ds
DS = xr.open_mfdataset( 'eraINTERIM_t2m_*.nc', concat_dim='cases', preprocess=preproc)
The good thing here is, that you keep the original time coordinate in stime
while renaming the original dimension (time
-> ntime
).
If everything works well, you should get resulting dimensions as (cases
, ntime
, latitude
, longitude
).
Disclaimer: I do similar in a loop with a final concat (wich works very well), but did not test the preprocess
-approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With