xarray.open_mfdataset for a small subset of variables

Question

I'm trying to read a timeseries of a single WRF output variable. The time series is distributed, one timestamp per file, across more than 5000 netCDF files. Each file contains roughly 200 variables.

Is there a way to call xarray.open_mfdataset() for only the variable I'm interested in? I can specify a single variable by providing a list to the 'data_vars' argument, but it still reads everything for the 'minimal' case. For my files the 'minimal' case includes almost everything and is thus relatively slow.

Is my best bet to create a single netCDF file containing my variable of interest with something like ncrcat, or is there a more streamlined way to do this entirely within xarray (or some other python tool)?

My netCDF files are netCDF4 (not netCDF4-classic), which seems to rule out netCDF4.MFDataset().

momme · Accepted Answer

Another option is to define a preprocessing function that defines the variables to keep via the "preprocess" keyword argument, e.g.:

preprocess=lambda ds: ds[variablelist]

bwc · Answer

I'm not sure why providing the data_vars= argument still reads all data - I experienced the same issue reading WRF output. My workaround was to make a list of all the variables I didn't need (all 200+) and feed that to the drop_variables= argument. You can get a list of all variables and then just delete or comment out the ones you want to keep.

varlist = list(ds.variables)

xarray.open_mfdataset for a small subset of variables

Tags:

python-xarray

Timothy W. Hilton

2 Answers

momme

bwc

Recent Activity

Donate For Us

xarray.open_mfdataset for a small subset of variables

Tags:

python-xarray

Timothy W. Hilton

2 Answers

momme

bwc

Related questions

Recent Activity

Donate For Us