Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop through netcdf files and run calculations - Python or R

Tags:

python

r

netcdf

This is my first time using netCDF and I'm trying to wrap my head around working with it.

I have multiple version 3 netcdf files (NOAA NARR air.2m daily averages for an entire year). Each file spans a year between 1979 - 2012. They are 349 x 277 grids with approximately a 32km resolution. Data was downloaded from here.

The dimension is time (hours since 1/1/1800) and my variable of interest is air. I need to calculate accumulated days with a temperature < 0. For example

    Day 1 = +4 degrees, accumulated days = 0
    Day 2 = -1 degrees, accumulated days = 1
    Day 3 = -2 degrees, accumulated days = 2
    Day 4 = -4 degrees, accumulated days = 3
    Day 5 = +2 degrees, accumulated days = 0
    Day 6 = -3 degrees, accumulated days = 1

I need to store this data in a new netcdf file. I am familiar with Python and somewhat with R. What is the best way to loop through each day, check the previous days value, and based on that, output a value to a new netcdf file with the exact same dimension and variable.... or perhaps just add another variable to the original netcdf file with the output I'm looking for.

Is it best to leave all the files separate or combine them? I combined them with ncrcat and it worked fine, but the file is 2.3gb.

Thanks for the input.

My current progress in python:

import numpy
import netCDF4
#Change my working DIR
f = netCDF4.Dataset('air7912.nc', 'r')
for a in f.variables:
  print(a)

#output =
     lat
     long
     x
     y
     Lambert_Conformal
     time
     time_bnds
     air

f.variables['air'][1, 1, 1]
#Output
     298.37473

To help me understand this better what type of data structure am I working with? Is ['air'] the key in the above example and [1,1,1] are also keys? to get the value of 298.37473. How can I then loop through [1,1,1]?

like image 722
mkmitchell Avatar asked Sep 06 '13 19:09

mkmitchell


3 Answers

You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat. So you code would look like this:

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()

atemp = f.variables['air']
print atemp

ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)

for i in xrange(ntimes):
    cold_days += atemp[i,:,:].data-273.15 < 0

pcolormesh(cold_days)
colorbar()

generated image of cold days

And here's one way to write the file (there might be easier ways):

# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

cold_days_v = nco.createVariable('cold_days', 'i4',  ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'

lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')

# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]

#  write the cold_days data
cold_days_v[:,:]=cold_days

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6'
nco.close()

If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI, it seems to be okay: enter image description here Also note that here I just downloaded two of the datasets for testing, so I used

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')

as an example. For all the data, you could use

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')

or

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')
like image 116
Rich Signell Avatar answered Oct 26 '22 22:10

Rich Signell


Here is an R solution.

infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)

outfile <- "data/air.colddays.nc"     

library(raster)

r <- raster::stack(infiles) 
r <- sum((r - 273.15) < 0)

plot(r)

enter image description here

like image 39
jsta Avatar answered Oct 26 '22 22:10

jsta


I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (e.g. for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output:

 cdo timsum -lec,273.15 in.nc out.nc

so a looped script would be:

files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
    # I use 273.15 as from the question seems T is in Kelvin 
    cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done 

If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller:

cdo cat *_numdays.nc total.nc 
cdo timsum total.nc total_below_zero.nc 

But again, the question seems to want accumulated days per event, which is different, but not provided by the accepted answer.

like image 23
Adrian Tompkins Avatar answered Oct 26 '22 22:10

Adrian Tompkins