My usual method for extracting the min/max of a variable's data values from a NetCDF file is a magnitude of order slower when switching to the netCDF4 Python module compared to scipy.io.netcdf. 
I am working with relatively large ocean model output files (from ROMS) with multiple depth levels over a given map region (Hawaii). When these were in NetCDF-3, I used scipy.io.netcdf. 
Now that these files are in NetCDF-4 ("Classic") I can no longer use scipy.io.netcdf and have instead switched over to using the netCDF4 Python module. However, the slowness is a concern and I wondered if there is a more efficient method of extracting a variable's data range (minimum and maximum data values)?
Here was my NetCDF-3 method using scipy:
import scipy.io.netcdf
netcdf = scipy.io.netcdf.netcdf_file(file)
var = netcdf.variables['sea_water_potential_temperature']
min = var.data.min()
max = var.data.max()
Here is my NetCDF-4 method using netCDF4:
import netCDF4
netcdf = netCDF4.Dataset(file)
var = netcdf.variables['sea_water_potential_temperature']
var_array = var.data.flatten()
min = var_array.data.min()
max = var_array.data.max()
The notable difference is that I must first flatten the data array in netCDF4, and this operation apparently slows things down.
Is there a better/faster way?
Per suggestion of hpaulj here is a function that calls the nco command ncwa using subprocess. It hangs terribly when using an OPeNDAP address, and I don't have any files on hand to test it locally.
You can see if it works for you and what the speed difference is.
This assumes you have the nco library installed.
def ncwa(path, fnames, var, op_type, times=None, lons=None, lats=None):
    '''Perform arithmetic operations on netCDF file or OPeNDAP data
    Args
    ----
    path: str
        prefix
    fnames: str or iterable
        Names of file(s) to perform operation on
    op_type: str
        ncwa arithmetic operation to perform. Available operations are:
        avg,mabs,mebs,mibs,min,max,ttl,sqravg,avgsqr,sqrt,rms,rmssdn
    times: tuple
        Minimum and maximum timestamps within which to perform the operation
    lons: tuple
        Minimum and maximum longitudes within which to perform the operation
    lats: tuple
        Minimum and maximum latitudes within which to perform the operation
    Returns
    -------
    result: float
        Result of the operation on the selected data
    Note
    ----
    Adapted from the OPeNDAP examples in the NCO documentation:
    http://nco.sourceforge.net/nco.html#OPeNDAP
    '''
    import os
    import netCDF4
    import numpy
    import subprocess
    output = 'tmp_output.nc'
    # Concatenate subprocess command
    cmd = ['ncwa']
    cmd.extend(['-y', '{}'.format(op_type)])
    if times:
        cmd.extend(['-d', 'time,{},{}'.format(times[0], times[1])])
    if lons:
        cmd.extend(['-d', 'lon,{},{}'.format(lons[0], lons[1])])
    if lats:
        cmd.extend(['-d', 'lat,{},{}'.format(lats[0], lats[1])])
    cmd.extend(['-p', path])
    cmd.extend(numpy.atleast_1d(fnames).tolist())
    cmd.append(output)
    # Run cmd and check for errors
    subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
    # Load, read, close data and delete temp .nc file
    data = netCDF4.Dataset(output)
    result = float(data[var][:])
    data.close()
    os.remove(output)
    return result
path = 'https://ecowatch.ncddc.noaa.gov/thredds/dodsC/hycom/hycom_reg6_agg/'
fname = 'HYCOM_Region_6_Aggregation_best.ncd'
times = (0.0, 48.0)
lons = (201.5, 205.5)
lats = (18.5, 22.5)
smax = ncwa(path, fname, 'salinity', 'max', times, lons, lats)
                        If you're just getting the min/max values across an array of a variable, you can use xarray.
%matplotlib inline
import xarray as xr
da = xr.open_dataset('infile/file.nc')
max = da.sea_water_potential_temperature.max()
min = da.sea_water_potential_temperature.min()
This should give you a single value of min/max, respectively. You could also get the min/max of a variable across a selected dimension like time, longitude, latitude etc. Xarray is great for handling multidimensional arrays that is why it's pretty easy to handle NetCDF in python when you're not using other operating tools like CDO and NCO. Lastly, xarray is also used in other related libraries that deals with weather and climate data in python ( http://xarray.pydata.org/en/stable/related-projects.html ).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With