I'm trying to read a timeseries of a single WRF output variable. The time series is distributed, one timestamp per file, across more than 5000 netCDF files. Each file contains roughly 200 variables.
Is there a way to call xarray.open_mfdataset() for only the variable I'm interested in? I can specify a single variable by providing a list to the 'data_vars' argument, but it still reads everything for the 'minimal' case. For my files the 'minimal' case includes almost everything and is thus relatively slow.
Is my best bet to create a single netCDF file containing my variable of interest with something like ncrcat, or is there a more streamlined way to do this entirely within xarray (or some other python tool)?
My netCDF files are netCDF4 (not netCDF4-classic), which seems to rule out netCDF4.MFDataset().
Another option is to define a preprocessing function that defines the variables to keep via the "preprocess" keyword argument, e.g.:
preprocess=lambda ds: ds[variablelist]
I'm not sure why providing the data_vars= argument still reads all data - I experienced the same issue reading WRF output. My workaround was to make a list of all the variables I didn't need (all 200+) and feed that to the drop_variables= argument. You can get a list of all variables and then just delete or comment out the ones you want to keep.
varlist = list(ds.variables)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With