Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xarray.open_mfdataset for a small subset of variables

I'm trying to read a timeseries of a single WRF output variable. The time series is distributed, one timestamp per file, across more than 5000 netCDF files. Each file contains roughly 200 variables.

Is there a way to call xarray.open_mfdataset() for only the variable I'm interested in? I can specify a single variable by providing a list to the 'data_vars' argument, but it still reads everything for the 'minimal' case. For my files the 'minimal' case includes almost everything and is thus relatively slow.

Is my best bet to create a single netCDF file containing my variable of interest with something like ncrcat, or is there a more streamlined way to do this entirely within xarray (or some other python tool)?

My netCDF files are netCDF4 (not netCDF4-classic), which seems to rule out netCDF4.MFDataset().

like image 274
Timothy W. Hilton Avatar asked Oct 26 '25 10:10

Timothy W. Hilton


2 Answers

Another option is to define a preprocessing function that defines the variables to keep via the "preprocess" keyword argument, e.g.:

preprocess=lambda ds: ds[variablelist]
like image 103
momme Avatar answered Oct 29 '25 05:10

momme


I'm not sure why providing the data_vars= argument still reads all data - I experienced the same issue reading WRF output. My workaround was to make a list of all the variables I didn't need (all 200+) and feed that to the drop_variables= argument. You can get a list of all variables and then just delete or comment out the ones you want to keep.

varlist = list(ds.variables)
like image 40
bwc Avatar answered Oct 29 '25 07:10

bwc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!