Reading xarray goes16 data directly from S3 without downloading into the system. the issue is that I cannot concatenate S3Files. I am recalling 24 files from S3 and want to read and extract the data for these files for the time range:
This is the code:
import datetime as dt
import xarray as xr
import fsspec
import s3fs
fs = fsspec.filesystem('s3', anon=True)
urls1=[]
for i in range (2):
urls = [
's3://' + f
for f in fs.glob(f"s3://noaa-goes16/ABI-L2ACMC/2022/001/{i:02}/*.nc")
]
urls1 = urls1+ urls
with fs.open(urls1[0]) as fileObj:
ds = xr.open_dataset(fileObj, engine='h5netcdf')
however, i run into the issue I/O operation on closed file.
Similarly to most file object interfaces in python, opening a file-like object with a context manager closes the file on exit. So in the following example:
# use fs.open to create an S3File object
with fs.open(urls1[0], mode="rb") as fileObj:
# open the netcdf for reading, but don't load the data - instead, just
# establish a lazy-load connection to the underlying S3File object
ds = xr.open_dataset(fileObj, engine='h5netcdf')
# <--
# exit the context, thereby closing the S3File object
# attempt to access the data again, after the stream is closed
ds.load() # raises IOError
Instead, you should either load all the data within the context manager:
with fs.open(urls1[0], mode="rb") as fileObj:
with xr.open_dataset(fileObj, engine='h5netcdf') as ds:
ds = ds.load()
Or, if you're planning to use the dataset in later code without loading:
fileObj = fs.open(urls1[0], mode="rb")
ds = xr.open_dataset(fileObj, engine='h5netcdf')
# other data operations
# be sure to close the connections when you're done
ds.close()
fileObj.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With