Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python HDF5 H5Py issues opening multiple files

Tags:

python

hdf5

h5py

I am usint the 64-bit version of Enthought Python to process data across multiple HDF5 files. I'm using h5py version 1.3.1 (HDF5 1.8.4) on 64-bit Windows.

I have an object that provides a convenient interface to my specific data heirarchy, but testing the h5py.File(fname, 'r') independently yields the same results. I am iterating through a long list (~100 files at a time) and attempting to pull out specific pieces of information from the files. The problem I'm having is that I'm getting the same information out of several files! My loop looks something like:

files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))

for filename in files:
  handle = hdf5.File(filename, 'r')
  data = extract_data_from_handle(handle)
  for row in data:
     out_csv.writerow((filename, ) +row)

When I inspect the files using something like hdfview, I know the internals are different. However, the csv I get seems to indicate that all the files contain the same data. Has anyone seen this behavior before? Any suggestions where I could go to start debugging this issue?

like image 617
Carl F. Avatar asked Oct 09 '22 20:10

Carl F.


1 Answers

I've concluded that this is a strange manifestation of Perplexing assignment behavior with h5py object as instance variable . I re-wrote my code so that each file is handled within a function call and the variable is not reused. Using this approach, I don't see the same strange behavior and it seems to work much better. For clarity, the solution looks more like:

files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))

def extract_data_from_filename(filename):
    return extract_data_from_handle(hdf5.File(filename, 'r'))

for filename in files:
  data = extract_data_from_filename(filename)
  for row in data:
     out_csv.writerow((filename, ) +row)
like image 57
Carl F. Avatar answered Oct 12 '22 09:10

Carl F.