Efficient way to extract data from NETCDF files

Tags:

I have a number of coordinates (roughly 20000) for which I need to extract data from a number of NetCDF files each comes roughly with 30000 timesteps (future climate scenarios). Using the solution here is not efficient and the reason is the time spent at each i,j to convert "dsloc" to "dataframe" (look at the code below). ** an example NetCDF file could be download from here **

Click to copy

import pandas as pd
import xarray as xr
import time

#Generate some coordinates
coords_data = [{'lat': 68.04, 'lon': 15.20, 'stid':1},
    {'lat':67.96, 'lon': 14.95, 'stid': 2}]
crd= pd.DataFrame(coords_data)
lat = crd["lat"]
lon = crd["lon"]
stid=crd["stid"]

NC = xr.open_dataset(nc_file)
point_list = zip(lat,lon,stid)
start_time = time.time()
for i,j,id in point_list:
    print(i,j)
    dsloc = NC.sel(lat=i,lon=j,method='nearest')
    print("--- %s seconds ---" % (time.time() - start_time))
    DT=dsloc.to_dataframe()
    DT.insert(loc=0,column="station",value=id)
    DT.reset_index(inplace=True)
    temp=temp.append(DT,sort=True)
    print("--- %s seconds ---" % (time.time() - start_time))

which results is:

Click to copy

68.04 15.2
--- 0.005853414535522461 seconds ---
--- 9.02660846710205 seconds ---
67.96 14.95
--- 9.028568267822266 seconds ---
--- 16.429600715637207 seconds ---

which means each i,j takes around 9 seconds to process. Given lots of coordinates and netcdf files with large timesteps, I wonder if there a pythonic way that the code could be optimized. I could also use CDO and NCO operators but I found a similar issue using them too.

260

asked Sep 25 '21 23:09

Seji

1 Answers

This is a perfect use case for xarray's advanced indexing using a DataArray index.

Click to copy

# Make the index on your coordinates DataFrame the station ID,
# then convert to a dataset.
# This results in a Dataset with two DataArrays, lat and lon, each
# of which are indexed by a single dimension, stid
crd_ix = crd.set_index('stid').to_xarray()

# now, select using the arrays, and the data will be re-oriented to have
# the data only for the desired pixels, indexed by 'stid'. The
# non-indexing coordinates lat and lon will be indexed by (stid) as well.
NC.sel(lon=crd_ix.lon, lat=crd_ix.lat, method='nearest')

Other dimensions in the data will be ignored, so if your original data has dimensions (lat, lon, z, time) your new data would have dimensions (stid, z, time).

163

answered Oct 13 '22 15:10

Michael Delgado

Related questions
                            
                                How to fill the values in the list and convert it into the dataframe?
                            
                                Making a ML model scikit-learn compatible
                            
                                InvalidArgumentError: required broadcastable shapes at loc(unknown)
                            
                                Forward fill only certain value
                            
                                How to get the target by adding using python
                            
                                VS Code portable on Linux is still using for packages local user folder instead of the enviroment folder, and because of that imports fail
                            
                                What is the Sobel operator?
                            
                                In Pandas with Groupby: assign a value from a column conditioned on another column
                            
                                Drop all rows that have all NA values after last row that is not NA
                            
                                Building ML classifier with imbalanced data
                            
                                yfinance not working - receiving json.decoder.JSONDecodeError
                            
                                Django admin, page not found in custom view
                            
                                AttributeError: dlsym(RTLD_DEFAULT, AttachDebuggerTracing): symbol not found
                            
                                Using decorators of optional dependency
                            
                                Can anyone please explain why set is behaving like this with boolean in it? [duplicate]
                            
                                How to parse datetime that is coming in Arabic text (٠٤-٢٥-٢٠٢١) to English dates in Pyspark
                            
                                Split a string in pandas row and insert new rows by enlarging the dataframe
                            
                                Pandas counting the number of group elements excluding the focal element
                            
                                divide group data base on select columns values?
                            
                                Pandas DataFrame to Excel cell alignment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient way to extract data from NETCDF files

Tags:

python

netcdf

python-xarray

cdo-climate

nco

Seji

People also ask

1 Answers

Michael Delgado

Recent Activity

Donate For Us