I have a netCDF file with the time dimension containing data by the hour for 2 years. I want to average it to get an hourly average for each hour of the day for each month. I tried this:
import xarray as xr
ds = xr.open_mfdataset('ecmwf_usa_2015.nc')
ds.groupby(['time.month', 'time.hour']).mean('time')
but I get this error:
*** TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension
How can I fix this? If I do this:
ds.groupby('time.month', 'time.hour').mean('time')
I do not get an error but the result has a time dimension of 12 (one value for each month), whereas I want an hourly average for each month i.e. 24 values for each of 12 months. Data is available here: https://www.dropbox.com/s/yqgg80wn8bjdksy/ecmwf_usa_2015.nc?dl=0
In case you didn't solve the problem yet, you can do it this way:
# define a function with the hourly calculation:
def hour_mean(x):
return x.groupby('time.hour').mean('time')
# group by month, then apply the function:
ds.groupby('time.month').apply(hour_mean)
This is the same strategy as the one in the first option given by @Prateek and based on the documentation, but the documentation was not that clear for me, so I hope this helps clarify. You can't apply a groupby operation to a groupby object so you have to build it into a function and use .apply() for it to work.
You are getting TypeError: group
must be an xarray.DataArray or the name of an xarray variable or dimension because ds.groupby() is supposed to take xarray dataset variable or array , you passed a list of variables.
Refer group by documentation group by documentation and convert dataset into splits
or bins
and then apply groupby('time.hour')
This is because applying groupby on month and then hour one by one or by together is aggregating all the data. If you split them you into month data you would apply group by - mean on each month.
You can try this approach as mentioned in documentation:
GroupBy: split-apply-combine
xarray supports “group by” operations with the same API as pandas to implement the split-apply-combine strategy:
- Split your data into multiple independent groups. => Split them by months using
groupby_bins
- Apply some function to each group. => apply group by
- Combine your groups back into a single data object. **apply aggregate function
mean('time')
Warning : Not all netcdfs are convertable to panda dataframe , there may be meta data loss while conversion.
Convert ds into pandas dataframe by df = ds.to_dataframe()
and use
group by as you require by using pandas.Grouper
like
df.set_index('time').groupby([pd.Grouper(freq='1M'), 't2m']).mean()
Note : I saw couple of answers with pandas.TimeGrouper
but its deprecated and one has to use pandas.Grouper
now.
Since your data set is too big and question does not have minimized data and working on it consuming heavy resources I would suggest to look at these examples on pandas
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With