Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing and decoding dataset in xarray to avoid conflicting _FillValue and missing_value

When using xarray open_dataset or open_mfdataset to load a NARR netcdf dataset (e.g. ftp://ftp.cdc.noaa.gov/Datasets/NARR/monolevel/air.2m.2010.nc), xarray returns an error regarding "conflicting _FillValue and missing_values".

Entering:

ds = xarray.open_dataset('air.2m.2010.nc')

yields this error:

ValueError: ('Discovered conflicting _FillValue and missing_value. Considering opening the offending dataset using decode_cf=False, corrected the attributes', 'and decoding explicitly using xray.conventions.decode_cf(ds)')

When using the suggestion to open as such:

ds = xarray.open_dataset('air.2m.2010.nc',decode_cf=False),

the dataset is opened, but the variables, time, coordinates etc. are not decoded (obviously). Using xarray.decode_cf(ds) explicitly does not seem to help to successfully decode the dataset as the same error is encountered.

I believe this error arises because the NARR dataset is a Lambert Conformal and so there are some missing values due to the shape of the grid as it is opened by xarray, and for some reason, this conflicts with the fill values.

What is the best way to open and decode this file in xarray?

N.B. I have been able to open and decode using netcdf4-python, but would like to be able to do this in xarray to utilize it's out of core computation functionality provided by dask.

like image 706
csg2136 Avatar asked Feb 09 '16 00:02

csg2136


2 Answers

This issue has been fixed in more recent versions of xarray. Using version 0.12, I get the following

>>> ds = xr.open_dataset('air.2m.2010.nc')
.../conventions.py:394: SerializationWarning: variable 'air' has multiple fill values {9.96921e+36, -9.96921e+36}, decoding all values to NaN.

In other words, it raises a warning, but not an error, and successfully applies a mask to both missing values.

So your issue can be fixed by upgrading to a more recent version of xarray.

like image 119
Ryan Avatar answered Oct 12 '22 19:10

Ryan


I was able to solve a similar issue I was having with NARR data from the same source and xarray, but only for the time variable. I did not have issues with the other variables.

I am sure there are much easier ways to do this (I am still pretty new at python + xarray), but I ended up taking the time variable and values from the dataset(s) I was interested in, created a new dataset and 'decoded' the time, then updated the time variable and values in my original dataset of interest.

test = xr.open_mfdataset(r'evap*nc',decode_cf=False)

t_unit = test.variables['time'] 
t_unit.attrs['units']
#u'hours since 1800-1-1 00:00:0.0'

attrs = {'units': 'hours since 1800-01-01'}
ds = xr.Dataset({'time': ('time', t_unit, attrs)})
ds = xr.decode_cf(ds)

test.update({'time':('time', ds['time'])})

Please let me know if you find an easier way! I don't have this issue with the study datasets I am currently using from another source, but would be curious as to how others solved this issue with the ESRL NARR data.

like image 22
Maria Molina Avatar answered Oct 12 '22 19:10

Maria Molina