Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add 'constant' dimension to xarray Dataset

I have a series of monthly gridded datasets in CSV form. I want to read them, add a few dimensions, and then write to netcdf. I've had great experience using xarray (xray) in the past so thought I'd use if for this task.

I can easily get them into a 2D DataArray with something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
da = xr.DataArray(data, coords=coords)

But when I try to add another dimension, which would convey information about time (all data is from the same year/month), things start to go sour.

I've tried two ways to crack this:

1) expand my input data to m x n x 1, something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
data = data[:,:,np.newaxis]

Then I follow the same steps as above, with coords updated to contain a third dimension.

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
coords['time'] = pd.datetime(year, month, day))
da = xr.DataArray(data, coords=coords)
da.to_dataset(name='variable_name')

This is fine for creating a DataArray -- but when I try to convert to a dataset (so I can write to netCDF), I get an error about 'ValueError: Coordinate objects must be 1-dimensional'

2) The second approach I've tried is taking my dataarray, casting it to a dataframe, setting the index to ['lat','lng', 'time'] and then going back to a dataset with xr.Dataset.from_dataframe(). I've tried this -- but it takes 20+ min before I kill the process.

Does anyone know how I can get a Dataset with a monthly 'time' dimension?

like image 440
badgley Avatar asked May 11 '16 21:05

badgley


People also ask

How do I convert a Dataframe to an xarray?

Starting with a DataFrame, you can directly convert it to a Dataset. This can be an excellent starting point since it creates a xarray object for you. In the example below, I create a dataFrame with one variable, y, and one index, x. I then use to_xarray () to make it into a xarray object. This dataset isn’t formatted very well yet.

How to create a dataset with xarray in Python?

The following syntax is used to create a dataset with xarray: ds = xr.Dataset (data_vars, coords, attrs) A complete dataset consists of three dictionaries: data_vars : The key is the variable name and value is a tuple consisting of

How to add a new dimension to a data array?

If DA is your data array with length DimLen, you can now use expand_dims: Because of the way that math is applied over new dimensions I like to multiply in order to add new dimensions. identityb = xr.DataArray (np.ones_like (b_coords), coords= [ ('b', b_coords)]) y = x * identityb Using .assign_coords method will do it.

What is the difference between data variables and coordinates in xarray?

There’s a distinction between data variables and coordinates, according to CF conventions. Xarray follows these conventions, but it mostly semantic and you don’t have to follow it. I see it like this: a data variable is the data of interest, and a coordinate is a label to describe the data of interest.


1 Answers

Your first example is pretty close:

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs}
coords['time'] = [datetime.datetime(year, month, day)]
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng', 'time'])
da.to_dataset(name='variable_name')

You'll notice a few changes in my version:

  1. I'm passing in a first for the 'time' coordinate instead of a scalar. You need to pass in a list or 1d array to get a 1D coordinate variable, which is what you need if you also use 'time' as a dimension. That's what the error ValueError: Coordinate objects must be 1-dimensional is trying to tell you (by the way -- if you have ideas for how to make that error message more helpful, I'm all ears!).
  2. I'm providing a dims argument to the DataArray constructor. Passing in a (non-ordered) dictionary is a little dangerous because the iteration order is not guaranteed.
  3. I also switched to datetime.datetime instead of pd.datetime. The later is simply an alias for the former.

Another sensible approach is to use concat with a list of one item once you've added 'time' as a scalar coordinate, e.g.,

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs, 'time': datetime.datetime(year, month, day)}
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng'])
expanded_da = xr.concat([da], 'time')

This version generalizes nicely to joining together data from a bunch of days -- you simply make the list of DataArrays longer. In my experience, most of the time the reason why you want the extra dimension in the first place is to be able to able to concat along it. Length 1 dimensions are not very useful otherwise.

like image 99
shoyer Avatar answered Oct 23 '22 19:10

shoyer