Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting xarray.Dataset with respect to multiple coordinates

Say I have an xarray.Dataset object loaded in using xarray.open_dataset(..., decode_times=False) that looks like this when printed:

<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 15, lon: 34, plev: 8, time: 3650)
Coordinates:
  * time       (time) float64 3.322e+04 3.322e+04 3.322e+04 3.322e+04 ...
  * plev       (plev) float64 1e+05 8.5e+04 7e+04 5e+04 2.5e+04 1e+04 5e+03 ...
  * lat        (lat) float64 40.46 43.25 46.04 48.84 51.63 54.42 57.21 60.0 ...
  * lon        (lon) float64 216.6 219.4 222.2 225.0 227.8 230.6 233.4 236.2 ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) float64 3.322e+04 3.322e+04 3.322e+04 3.322e+04 ...
    lat_bnds   (lat, bnds) float64 39.07 41.86 41.86 44.65 44.65 47.44 47.44 ...
    lon_bnds   (lon, bnds) float64 215.2 218.0 218.0 220.8 220.8 223.6 223.6 ...
    hus        (time, plev, lat, lon) float64 0.006508 0.007438 0.008751 ...

What would be the best way to subset this given multiple ranges for lat, lon, and time? I've tried chaining a series of conditions and used xarray.Dataset.where, but I get an error saying:

IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().

I can't load the entire dataset into memory, so what would be the typical way to do this?

like image 714
pbreach Avatar asked Oct 18 '25 12:10

pbreach


1 Answers

NetCDF4 doesn't support all of the multi-dimensional indexing operations supported by NumPy. But does support slicing (which is very fast) and one dimensional indexing (somewhat slower).

Some things to try:

  • Index with slices (e.g., .sel(time=slice(start, end))) before indexing with 1-dimensional arrays. This should offload the array-based indexing from netCDF4 to Dask/NumPy.
  • Split up your indexing operations into more intermediate operations that index along fewer dimensions at once. It sounds like you've already tried this one, but maybe it's worth exploring a little more.
  • To optimize performance, try different Dask chunking schemes using the .chunk().

If that doesn't work, post a full self-contained example to the xarray issue tracker on GitHub and we can take a look into it in more detail.

like image 56
shoyer Avatar answered Oct 21 '25 01:10

shoyer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!