Learning how to produce netCDF files from Pandas DFs, using xarray. Following several tutorials and SO questions Add 'constant' dimension to xarray Dataset and Add 'constant' dimension to xarray Dataset but having some issues still ,as I can't get the Date_Time, lat and lon as dimensions. When I do a nc dump, they are not correct.
Initial approach importing txt file to pandas df then xr to netCDF:
import pandas as pd
import xarray
#IMport Data from .dat file
colnames1 = ['Date','Time','latitude','longitude','Status','depth']
df2 = pd.read_csv('test.txt',header=0,error_bad_lines=False, names = colnames1,delim_whitespace=True)
# create xray Dataset from Pandas DataFrame
xr = xarray.Dataset.from_dataframe(df2)
# add variable attribute metadata
xr['latitude'].attrs={'units':'degrees', 'long_name':'Latitude'}
xr['longitude'].attrs={'units':'degrees', 'long_name':'Longitude'}
xr['depth'].attrs={'units':'m', 'long_name':'depth'}
# add global attribute metadata
xr.attrs={'Conventions':'CF-1.6', 'title':'Data', 'summary':'Data generated'}
#print xr
print xr
# save to netCDF
xr.to_netcdf('test.nc')
where df2 =
Date Time grid_latitude grid_longitude Status depth
2017-09-05 13:01:59 -29.034083 31.068567 2.0 0.0
2017-09-05 13:01:59 -29.039367 31.059150 2.0 0.0
2017-09-05 13:01:59 -29.036650 31.059200 3.0 0.0
2017-09-05 13:01:59 -29.036750 31.065417 7.0 100.0
2017-09-05 13:01:59 -29.039317 31.056050 7.0 100.0
2017-09-05 13:01:59 -29.034000 31.062367 3.0 0.0
2017-09-05 13:01:59 -29.036517 31.049900 3.0 0.0
2017-09-05 13:01:59 -29.031100 31.050000 3.0 0.0
This works fine but the dimension is not correct (see below):
<xarray.Dataset>
Dimensions: (index: 8)
Coordinates:
* index (index) int64 0 1 2 3 4 5 6 7
Data variables:
Date (index) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Time (index) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (index) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (index) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (index) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (index) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
If I set the Date, or a merged Date_Time, as the DF index, the dimension for the Date/Time is fine and seen as a dimension:
<xarray.Dataset>
Dimensions: (Date: 8)
Coordinates:
* Date (Date) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Data variables:
Time (Date) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
latitude (Date) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
longitude (Date) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
Status (Date) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
depth (Date) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
title: Data
summary: Data generated
Conventions: CF-1.6
But if I set the df.index on the Date_Time, Lat and Lon, it reverts back to the blank (index). Would appreciate pointers to get the dimensions set. With the netCDF module one could use the syntax: lat = dataset.createDimension('lat', 73) to create a dimension. The SO example add dimension to an xarray DataArray doesn't help either. Maybe I'm missing something, or it's my limitation on learning. I'd like to get it to the point where the nc dump produces something similar to this.
NetCDF dimension information:
Name: lat
size: 73
type: dtype('float32')
units: u'degrees_north'
actual_range: array([ 90., -90.], dtype=float32)
long_name: u'Latitude'
standard_name: u'latitude'
axis: u'Y'
Name: lon
size: 144
type: dtype('float32')
units: u'degrees_east'
long_name: u'Longitude'
actual_range: array([ 0. , 357.5], dtype=float32)
standard_name: u'longitude'
axis: u'X'
Name: time
size: 366
type: dtype('float64')
units: u'hours since 1-1-1 00:00:0.0'
long_name: u'Time'
actual_range: array([ 17628096., 17636856.])
delta_t: u'0000-00-01 00:00:00'
standard_name: u'time'
axis: u'T'
avg_period: u'0000-00-01 00:00:00'
Else I could convert the DF columns to a np array, and use the netCDF module? Many thanks in advance. I did venture to trying something like this, but I doubt it's on the right path:
#add dimeensions
#d = {}
#d['time'] = ('time',df2.Time)
#d['latitude'] = ('latitude',df2.latitude)
#d['longitude'] = ('longitude', df2.longitude)
#d['var'] = (['time','latitude','longitude','Depth'], xr)
#xr = xray.Dataset(d)
This is easiest to achieve by combining Time
, grid_latitude
and grid_longitude
into a pandas.MultiIndex
on the DataFrame with set_index()
before converting into an xarray Dataset.
For example:
# note that pandas.DataFrame's to_xarray() method is equivalent to
# xarray.Dataset.from_dataframe()
ds = df.set_index(['Time', 'grid_latitude', 'grid_longitude']).to_xarray()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With