Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute annual mean using x-arrays

I have a python xarray dataset with time,x,y for its dimensions and value1 as its variable. I'm trying to compute annual mean of value1 for each x,y coordinate pair.

I've run into this function while reading the docs:

ds.groupby('time.year').mean()  

This seems to compute a single annual mean for all x,y coordinate pairs in value1 at each given time slice
rather than the annual means of individual x,y coordinate pairs at each given time slice.

While the code snippet above produces the wrong output, I'm very interested in its oversimplified form. I would really like to figure out the "X-arrays trick" to doing annual mean for a given x,y coordinate pair rather than hacking it together myself.

Cam someone point me in the right direction? Should I temporarily turn this into a pandas object?

like image 840
Conic Avatar asked Mar 10 '23 22:03

Conic


2 Answers

To avoid the default of averaging over all dimensions, you simply need to supply the dimension you want to average over explicitly: ds.groupby('time.year').mean('time')

like image 64
shoyer Avatar answered Mar 28 '23 15:03

shoyer


Note, that calling ds.groupby('time.year').mean('time') will be incorrect if you are working with monthly and not daily data. Taking the mean will place equal weight on months of different length, e.g., Feb and July, which is wrong.

Instead use below from NCAR:

def weighted_temporal_mean(ds, var):
  """
  weight by days in each month
  """
  # Determine the month length
  month_length = ds.time.dt.days_in_month

  # Calculate the weights
  wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()

  # Make sure the weights in each year add up to 1
  np.testing.assert_allclose(wgts.groupby("time.year").sum(xr.ALL_DIMS), 1.0)

  # Subset our dataset for our variable
  obs = ds[var]

  # Setup our masking for nan values
  cond = obs.isnull()
  ones = xr.where(cond, 0.0, 1.0)

  # Calculate the numerator
  obs_sum = (obs * wgts).resample(time="AS").sum(dim="time")

  # Calculate the denominator
  ones_out = (ones * wgts).resample(time="AS").sum(dim="time")

  # Return the weighted average
  return obs_sum / ones_out

average_weighted_temp = weighted_temporal_mean(ds_first_five_years, 'TEMP')
like image 32
Skrt Avatar answered Mar 28 '23 15:03

Skrt