I have a series of monthly gridded datasets in CSV form. I want to read them, add a few dimensions, and then write to netcdf. I've had great experience using xarray (xray) in the past so thought I'd use if for this task.
I can easily get them into a 2D DataArray with something like:
data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords = {'lat': lats, 'lng':lngs}
da = xr.DataArray(data, coords=coords)
But when I try to add another dimension, which would convey information about time (all data is from the same year/month), things start to go sour.
I've tried two ways to crack this:
1) expand my input data to m x n x 1, something like:
data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords = {'lat': lats, 'lng':lngs}
data = data[:,:,np.newaxis]
Then I follow the same steps as above, with coords updated to contain a third dimension.
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords = {'lat': lats, 'lng':lngs}
coords['time'] = pd.datetime(year, month, day))
da = xr.DataArray(data, coords=coords)
da.to_dataset(name='variable_name')
This is fine for creating a DataArray -- but when I try to convert to a dataset (so I can write to netCDF), I get an error about 'ValueError: Coordinate objects must be 1-dimensional'
2) The second approach I've tried is taking my dataarray, casting it to a dataframe, setting the index to ['lat','lng', 'time'] and then going back to a dataset with xr.Dataset.from_dataframe()
. I've tried this -- but it takes 20+ min before I kill the process.
Does anyone know how I can get a Dataset with a monthly 'time' dimension?
Starting with a DataFrame, you can directly convert it to a Dataset. This can be an excellent starting point since it creates a xarray object for you. In the example below, I create a dataFrame with one variable, y, and one index, x. I then use to_xarray () to make it into a xarray object. This dataset isn’t formatted very well yet.
The following syntax is used to create a dataset with xarray: ds = xr.Dataset (data_vars, coords, attrs) A complete dataset consists of three dictionaries: data_vars : The key is the variable name and value is a tuple consisting of
If DA is your data array with length DimLen, you can now use expand_dims: Because of the way that math is applied over new dimensions I like to multiply in order to add new dimensions. identityb = xr.DataArray (np.ones_like (b_coords), coords= [ ('b', b_coords)]) y = x * identityb Using .assign_coords method will do it.
There’s a distinction between data variables and coordinates, according to CF conventions. Xarray follows these conventions, but it mostly semantic and you don’t have to follow it. I see it like this: a data variable is the data of interest, and a coordinate is a label to describe the data of interest.
Your first example is pretty close:
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords = {'lat': lats, 'lng': lngs}
coords['time'] = [datetime.datetime(year, month, day)]
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng', 'time'])
da.to_dataset(name='variable_name')
You'll notice a few changes in my version:
ValueError: Coordinate objects must be 1-dimensional
is trying to tell you (by the way -- if you have ideas for how to make that error message more helpful, I'm all ears!).dims
argument to the DataArray constructor. Passing in a (non-ordered) dictionary is a little dangerous because the iteration order is not guaranteed.datetime.datetime
instead of pd.datetime
. The later is simply an alias for the former.Another sensible approach is to use concat
with a list of one item once you've added 'time' as a scalar coordinate, e.g.,
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords = {'lat': lats, 'lng': lngs, 'time': datetime.datetime(year, month, day)}
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng'])
expanded_da = xr.concat([da], 'time')
This version generalizes nicely to joining together data from a bunch of days -- you simply make the list of DataArrays longer. In my experience, most of the time the reason why you want the extra dimension in the first place is to be able to able to concat along it. Length 1 dimensions are not very useful otherwise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With