Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to index a NetCDF file very quickly

So I am trying to index a NetCDF file to get stream flow rate data in a certain grid cell. The NetCDF file I am using has the following characteristics:

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
CDI: Climate Data Interface version 1.6.4 (http://code.zmaw.de/projects/cdi)
Conventions: CF-1.4
dimensions(sizes): lon(3600), lat(1800), time(31)
variables(dimensions): float64 lon(lon), float64 lat(lat), float64 time(time), float32 dis(time,lat,lon)

I have 35+ years of this data and I am trying to get the data from an individual grid and create a time-series to compare it do a different model's forecasts. The code I am currently using to extract data from a grid cell is below.

from netCDF4 import Dataset
import numpy as np

root_grp = Dataset(r'C:\Users\wadear\Desktop\ERAIland_daily_dis_198001.nc')
dis = root_grp.variables['dis']
lat = np.round(root_grp.variables['lat'][:], decimals=2).tolist()
lon = np.round(root_grp.variables['lon'][:], decimals=2).tolist()
time = root_grp.variables['time'].shape[0]

lat_index = lat.index(27.95)
lon_index = lon.index(83.55)

for i in range(time):
    print(dis[i][lat_index][lon_index])

Right now this feels really slow, and it will take a long time to do this over a 35+ year timespan, and while doing multiple different grid cells, the time it takes will really build up.

Is there a tool to speed up this process with faster I/O or indexing?

Thanks!

like image 864
pythonweb Avatar asked Mar 07 '23 11:03

pythonweb


2 Answers

You should get a big time saving if you remove the loop over time and access the entire time series at once, i.e.

dis[:,lat_index,lon_index]

Further speed gains can be obtained if you apply chunking in the time dimension. Look up the documentation for nccopy. If you need to access the time series repeatedly, this is worth doing. You may wish to concatenate some of your NetCDF files before chunking, e.g. monthly -> annual. This is done using ncrcat utility.

See also Chunking Data: Why it Matters.

like image 138
Robert Davy Avatar answered Mar 10 '23 14:03

Robert Davy


why not simply extract the point with CDO first and then read in the point data:

cdo remapnn,lon=83.55/lat=27.95 input.nc point_output.nc

on ubuntu if you don't have CDO installed, you can install it with

sudo apt-get install cdo 
like image 23
Adrian Tompkins Avatar answered Mar 10 '23 15:03

Adrian Tompkins