Handling very large netCDF files in python

Tags:

I am trying to work with data from very large netCDF files (~400 Gb each). Each file has a few variables, all much larger than the system memory (e.g. 180 Gb vs 32 Gb RAM). I am trying to use numpy and netCDF4-python do some operations on these variables by copying a slice at a time and operating on that slice. Unfortunately, it is taking a really long time just to read each slice, which is killing the performance.

For example, one of the variables is an array of shape (500, 500, 450, 300). I want to operate on the slice [:,:,0], so I do the following:

import netCDF4 as nc

f = nc.Dataset('myfile.ncdf','r+')
myvar = f.variables['myvar']
myslice = myvar[:,:,0]

But the last step takes a really long time (~5 min on my system). If for example I saved a variable of shape (500, 500, 300) on the netcdf file, then a read operation of the same size will take only a few seconds.

Is there any way I can speed this up? An obvious path would be to transpose the array so that the indices that I am selecting would come up first. But in such a large file this would not be possible to do in memory, and it seems even slower to attempt it given that a simple operation already takes a long time. What I would like is a quick way to read a slice of a netcdf file, in the fashion of the Fortran's interface get_vara function. Or some way of efficiently transposing the array.

328

asked Aug 22 '12 07:08

tiago

1 Answers

You can transpose netCDF variables too large to fit in memory by using the nccopy utility, which is documented here:

http://www.unidata.ucar.edu/netcdf/docs/guide_nccopy.html

The idea is to "rechunk" the file by specifying what shapes of chunks (multidimensional tiles) you want for the variables. You can specify how much memory to use as a buffer and how much to use for chunk caches, but it's not clear how to use memory optimally between these uses, so you may have to just try some examples and time them. Rather than completely transpose a variable, you probably want to "partially transpose" it, by specifying chunks that have a lot of data along the 2 big dimensions of your slice and have only a few values along the other dimensions.

176

answered Sep 21 '22 09:09

Russ Rew

Related questions
                            
                                Subclass builtin List
                            
                                Best resources for learning PyGame? [closed]
                            
                                Google App Engine or Django? [closed]
                            
                                Python finding substring between certain characters using regex and replace()
                            
                                Edit configuration file through python
                            
                                Architecture for providing different linear algebra back-ends
                            
                                Python: How can I check the number of pending tasks in a multiprocessing.Pool?
                            
                                Full text searching XML data with Python: best practices, pros & cons
                            
                                South ignores change in field default value in Python / Django
                            
                                Java/Python communication via message broker
                            
                                ValueError when using strptime to get a datetime object [duplicate]
                            
                                Normalize histogram (brightness and contrast) of a set of images using Python Image Library (PIL)
                            
                                How to design a program with many configuration options?
                            
                                Disabling python's assert() without -0 flag
                            
                                Mock patch is not working with class in __init__.py
                            
                                How to use jQuery UI Datepicker as a Django Widget?
                            
                                Why can't attribute names be Python keywords?
                            
                                Python + GTK: How to set a selected row on gtk.treeview
                            
                                Why does adding to a list do different things? [duplicate]
                            
                                Is it possible to make this shell script faster?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handling very large netCDF files in python

Tags:

python

numpy

netcdf

tiago

People also ask

1 Answers

Russ Rew

Recent Activity

Donate For Us