I am trying to convert a .csv file to a netCDF4 via Python but I am having trouble figuring out how I can store information from a .csv table format into a netCDF. My main concern is how do we declare the variables from the columns into a workable netCDF4 format? Everything I have found is normally extracting information from a netCDF4 to a .csv or ASCII. I have provided the sample data, sample code, and my errors for declaring the appropriate arrays. Any help would be much appreciated.
The sample table is below:
Station Name Country Code Lat Lon mn.yr temp1 temp2 temp3 hpa
Somewhere US 12340 35.52 23.358 1.19 -8.3 -13.1 -5 69.5
Somewhere US 12340 2.1971 -10.7 -13.9 -7.9 27.9
Somewhere US 12340 3.1971 -8.4 -13 -4.3 90.8
My sample code is:
#!/usr/bin/env python
import scipy
import numpy
import netCDF4
import csv
from numpy import arange, dtype
#Declare empty arrays
v1 = []
v2 = []
v3 = []
v4 = []
# Open csv file and declare variable for arrays for each heading
f = open('station_data.csv', 'r').readlines()
for line in f[1:]:
fields = line.split(',')
v1.append(fields[0]) #station
v2.append(fields[1])#country
v3.append(int(fields[2]))#code
v4.append(float(fields[3]))#lat
v5.append(float(fields[3]))#lon
#more variables included but this is just an abridged list
print v1
print v2
print v3
print v4
#convert to netcdf4 framework that works as a netcdf
ncout = netCDF4.Dataset('station_data.nc','w')
# latitudes and longitudes. Include NaN for missing numbers
lats_out = -25.0 + 5.0*arange(v4,dtype='float32')
lons_out = -125.0 + 5.0*arange(v5,dtype='float32')
# output data.
press_out = 900. + arange(v4*v5,dtype='float32') # 1d array
press_out.shape = (v4,v5) # reshape to 2d array
temp_out = 9. + 0.25*arange(v4*v5,dtype='float32') # 1d array
temp_out.shape = (v4,v5) # reshape to 2d array
# create the lat and lon dimensions.
ncout.createDimension('latitude',v4)
ncout.createDimension('longitude',v5)
# Define the coordinate variables. They will hold the coordinate information
lats = ncout.createVariable('latitude',dtype('float32').char,('latitude',))
lons = ncout.createVariable('longitude',dtype('float32').char,('longitude',))
# Assign units attributes to coordinate var data. This attaches a text attribute to each of the coordinate variables, containing the units.
lats.units = 'degrees_north'
lons.units = 'degrees_east'
# write data to coordinate vars.
lats[:] = lats_out
lons[:] = lons_out
# create the pressure and temperature variables
press = ncout.createVariable('pressure',dtype('float32').char,('latitude','longitude'))
temp = ncout.createVariable('temperature',dtype('float32').char,'latitude','longitude'))
# set the units attribute.
press.units = 'hPa'
temp.units = 'celsius'
# write data to variables.
press[:] = press_out
temp[:] = temp_out
ncout.close()
f.close()
error:
Traceback (most recent call last):
File "station_data.py", line 33, in <module>
v4.append(float(fields[3]))#lat
ValueError: could not convert string to float:
xarray usage to convert netcdf to csv We open the netcdf file (using open_dataset() method), convert it to a dataframe ( to_dataframe() method) and write this object to a csv file ( to_csv() method).
An NC file is a data file created in the netCDF (network Common Data Form) format, a format used for storing multidimensional data in a manner independent of the platforms and disciplines for which it is used.
Conversion from NETCDF to XLSX Upload your NETCDF data (widely used in software like QGIS) and convert them by one click to XLSX format (widely used in software like MS Excel). Notice to XLSX format - In case your data are POINT type, then XY coordinates will be exported as well.
This is a perfect job for xarray, a python package that has a dataset object representing the netcdf common data model. Here's an example you can try:
import pandas as pd
import xarray as xr
url = 'http://www.cpc.ncep.noaa.gov/products/precip/CWlink/'
ao_file = url + 'daily_ao_index/monthly.ao.index.b50.current.ascii'
nao_file = url + 'pna/norm.nao.monthly.b5001.current.ascii'
kw = dict(sep='\s*', parse_dates={'dates': [0, 1]},
header=None, index_col=0, squeeze=True, engine='python')
# read into Pandas Series
s1 = pd.read_csv(ao_file, **kw)
s2 = pd.read_csv(nao_file, **kw)
s1.name='AO'
s2.name='NAO'
# concatenate two Pandas Series into a Pandas DataFrame
df=pd.concat([s1, s2], axis=1)
# create xarray Dataset from Pandas DataFrame
xds = xr.Dataset.from_dataframe(df)
# add variable attribute metadata
xds['AO'].attrs={'units':'1', 'long_name':'Arctic Oscillation'}
xds['NAO'].attrs={'units':'1', 'long_name':'North Atlantic Oscillation'}
# add global attribute metadata
xds.attrs={'Conventions':'CF-1.0', 'title':'AO and NAO', 'summary':'Arctic and North Atlantic Oscillation Indices'}
# save to netCDF
xds.to_netcdf('/usgs/data2/notebook/data/ao_and_nao.nc')
Then running ncdump -h ao_and_nao.nc
produces:
netcdf ao_and_nao {
dimensions:
dates = 782 ;
variables:
double dates(dates) ;
dates:units = "days since 1950-01-06 00:00:00" ;
dates:calendar = "proleptic_gregorian" ;
double NAO(dates) ;
NAO:units = "1" ;
NAO:long_name = "North Atlantic Oscillation" ;
double AO(dates) ;
AO:units = "1" ;
AO:long_name = "Arctic Oscillation" ;
// global attributes:
:title = "AO and NAO" ;
:summary = "Arctic and North Atlantic Oscillation Indices" ;
:Conventions = "CF-1.0" ;
Note that you can install xarray
using pip
, but if you are using the Anaconda Python Distribution, you can install it from the Anaconda.org/conda-forge channel by using:
conda install -c conda-forge xarray
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With