Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get hourly average for each month from a netcdf file

I have a netCDF file with the time dimension containing data by the hour for 2 years. I want to average it to get an hourly average for each hour of the day for each month. I tried this:

import xarray as xr
ds = xr.open_mfdataset('ecmwf_usa_2015.nc')    
ds.groupby(['time.month', 'time.hour']).mean('time')

but I get this error:

*** TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension

How can I fix this? If I do this:

ds.groupby('time.month', 'time.hour').mean('time')

I do not get an error but the result has a time dimension of 12 (one value for each month), whereas I want an hourly average for each month i.e. 24 values for each of 12 months. Data is available here: https://www.dropbox.com/s/yqgg80wn8bjdksy/ecmwf_usa_2015.nc?dl=0

like image 835
user308827 Avatar asked Apr 02 '18 23:04

user308827


2 Answers

In case you didn't solve the problem yet, you can do it this way:

# define a function with the hourly calculation:
def hour_mean(x):
     return x.groupby('time.hour').mean('time')

# group by month, then apply the function:
ds.groupby('time.month').apply(hour_mean)

This is the same strategy as the one in the first option given by @Prateek and based on the documentation, but the documentation was not that clear for me, so I hope this helps clarify. You can't apply a groupby operation to a groupby object so you have to build it into a function and use .apply() for it to work.

like image 84
JulianGiles Avatar answered Nov 19 '22 14:11

JulianGiles


You are getting TypeError: group must be an xarray.DataArray or the name of an xarray variable or dimension because ds.groupby() is supposed to take xarray dataset variable or array , you passed a list of variables.

You have two options:

1. xarray bins --> group by hour

Refer group by documentation group by documentation and convert dataset into splits or bins and then apply groupby('time.hour')

This is because applying groupby on month and then hour one by one or by together is aggregating all the data. If you split them you into month data you would apply group by - mean on each month.

You can try this approach as mentioned in documentation:

GroupBy: split-apply-combine

xarray supports “group by” operations with the same API as pandas to implement the split-apply-combine strategy:

  • Split your data into multiple independent groups. => Split them by months using groupby_bins
  • Apply some function to each group. => apply group by
  • Combine your groups back into a single data object. **apply aggregate function mean('time')

2. convert it into pandas dataframe and use group by

Warning : Not all netcdfs are convertable to panda dataframe , there may be meta data loss while conversion.

Convert ds into pandas dataframe by df = ds.to_dataframe()and use group by as you require by using pandas.Grouperlike

df.set_index('time').groupby([pd.Grouper(freq='1M'), 't2m']).mean()

Note : I saw couple of answers with pandas.TimeGrouper but its deprecated and one has to use pandas.Grouper now.

Since your data set is too big and question does not have minimized data and working on it consuming heavy resources I would suggest to look at these examples on pandas

  1. group by weekdays
  2. group by time
  3. groupby-date-range-depending-on-each-row
  4. group-and-count-rows-by-month-and-year
like image 6
Morse Avatar answered Nov 19 '22 14:11

Morse