I have a working python program that reads in a number of large netCDF files using the Dataset command from the netCDF4 module. Here is a snippet of the relevant parts:
from netCDF4 import Dataset
import glob
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*')):
ncin = Dataset(infile,'r')
ncin.close()
I want to modify this to read in netCDF files that are gzipped. The files themselves were gzipped after creation; they are not internally compressed (i.e., the files are *.nc.gz). If I were reading in gzipped text files, the command would be:
from netCDF4 import Dataset
import glob
import gzip
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*.gz')):
f = gzip.open(infile, 'rb')
file_content = f.read()
f.close()
After googling around for maybe half an hour and reading through the netCDF4 documentation, the only way I can come up with to do this for netCDF files is:
from netCDF4 import Dataset
import glob
import os
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*.gz')):
os.system('gzip -d ' + infile)
ncin = Dataset(infile[:-3],'r')
ncin.close()
os.system('gzip ' + infile[:-3])
Is it possible to read gzip files with the Dataset command directly? Or without otherwise calling gzip through os?
To install with anaconda (conda) simply type conda install netCDF4 . Alternatively, you can install with pip . To be sure your netCDF4 module is properly installed start an interactive session in the terminal (type python and press 'Enter'). Then import netCDF4 as nc .
Launch WinZip from your start menu or Desktop shortcut. Open the compressed file by clicking File > Open. If your system has the compressed file extension associated with WinZip program, just double-click on the file.
Reading datasets from memory is supported since netCDF4-1.2.8 (Changelog):
import netCDF4
import gzip
with gzip.open('test.nc.gz') as gz:
with netCDF4.Dataset('dummy', mode='r', memory=gz.read()) as nc:
print(nc.variables)
See the description of the memory
parameter in the Dataset
documentation
Because NetCDF4-Python wraps the C NetCDF4 library, you're out of luck as far as using the gzip module to pass in a file-like object. The only option is, as suggested by @tdelaney, to use the gzip to extract to a temporary file.
If you happen to have any control over the creation of these files, NetCDF version 4 files support zlib compression internally, so that using gzip is superfluous. It might also be worth converting the files from version 3 to version 4 if you need to repeatedly process these files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With