Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove a dimension from some variables in an xarray Dataset

I have an xarray Dataset where some variables have more dimensions than necessary (e.g., a 3D dataset where the "latitude" and "longitude" variables also vary along time). How do I remove the extra dimensions?

For example, in the dataset below, 'bar' is a 2D variable along the x and y axes, with constant values along the x axis. How do I remove the x dimension from 'bar' but not 'foo'?

>>> ds = xr.Dataset({'foo': (('x', 'y'), np.random.randn(2, 3))},
                    {'x': [1, 2], 'y': [1, 2, 3],
                     'bar': (('x', 'y'), [[4, 5, 6], [4, 5, 6]])})
>>> ds
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2 3
    bar      (x, y) int64 4 5 6 4 5 6
Data variables:
    foo      (x, y) float64 -0.9595 0.6704 -1.047 0.9948 0.8241 1.643
like image 588
shoyer Avatar asked Jan 24 '17 18:01

shoyer


People also ask

How do I drop a dimension in Xarray?

In future versions of xarray (v0. 9 and later), you will be able to drop coordinates when indexing by writing drop=True , e.g., ds['bar']. sel(x=1, drop=True) .

What is Python Xarray?

xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!


1 Answers

The most direct way to remove the extra dimension (using indexing) results in a slightly confusing error message:

>>> ds['bar'] = ds['bar'].sel(x=1)
ValueError: dimension 'x' already exists as a scalar variable

The problem is that when you do indexing in xarray, it keeps around indexed coordinates as scalar coordinates:

>>> ds['bar'].sel(x=1)
<xarray.DataArray 'bar' (y: 3)>
array([4, 5, 6])
Coordinates:
    x        int64 1
  * y        (y) int64 1 2 3
    bar      (y) int64 4 5 6

This is often useful, but in this case the scalar coordinate 'x' on the indexed array conflicts with the non-scalar coordinate (and dimension) 'x' when you try to set it on the original dataset. Hence xarray errors instead of overriding the variable.

To get around this, you need to drop the scalar 'x' after indexing. In the current version of xarray, you can do this with drop:

>>> ds['bar'] = ds['bar'].sel(x=1).drop('x')
>>> ds
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2 3
    bar      (y) int64 4 5 6
Data variables:
    foo      (x, y) float64 -0.9595 0.6704 -1.047 0.9948 0.8241 1.643

In future versions of xarray (v0.9 and later), you will be able to drop coordinates when indexing by writing drop=True, e.g., ds['bar'].sel(x=1, drop=True).

like image 118
shoyer Avatar answered Sep 23 '22 03:09

shoyer