Combining hdf5 files

Tags:

I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file containing all datasets separately (i.e. not to concatenate the datasets into a single dataset).

One way to do this is to create a hdf5 file and then copy the datasets one by one. This will be slow and complicated because it will need to be buffered copy.

Is there a more simple way to do this? Seems like there should be, since it is essentially just creating a container file.

I am using python/h5py.

813

asked Aug 28 '13 15:08

Bitwise

1 Answers

This is actually one of the use-cases of HDF5. If you just want to be able to access all the datasets from a single file, and don't care how they're actually stored on disk, you can use external links. From the HDF5 website:

External links allow a group to include objects in another HDF5 file and enable the library to access those objects as if they are in the current file. In this manner, a group may appear to directly contain datasets, named datatypes, and even groups that are actually in a different file. This feature is implemented via a suite of functions that create and manage the links, define and retrieve paths to external objects, and interpret link names:

Here's how to do it in h5py:

myfile = h5py.File('foo.hdf5','a') myfile['ext link'] = h5py.ExternalLink("otherfile.hdf5", "/path/to/resource")

Be careful: when opening myfile, you should open it with 'a' if it is an existing file. If you open it with 'w', it will erase its contents.

This would be very much faster than copying all the datasets into a new file. I don't know how fast access to otherfile.hdf5 would be, but operating on all the datasets would be transparent - that is, h5py would see all the datasets as residing in foo.hdf5.

answered Sep 18 '22 18:09

Yossarian

Related questions
                            
                                Amazon SES SMTP with Django
                            
                                reinitialize an object with self.__init__(...)
                            
                                ipython: how to set terminal width
                            
                                RequestDataTooBig Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE
                            
                                Transform string to f-string
                            
                                pandas, melt, unmelt preserve index
                            
                                How to get back to the for loop after exception handling
                            
                                What are acceptable use-cases for python's `assert` statement?
                            
                                Function with arguments in two lists
                            
                                Already Registered at /appname/: The model User is already registered
                            
                                How to get line breaks in e-mail sent using Python's smtplib?
                            
                                Check if element exists in tuple of tuples
                            
                                How to obtain a gaussian filter in python
                            
                                What is the best way to remove a dictionary item by value in python? [duplicate]
                            
                                How to gauss-filter (blur) a floating point numpy array
                            
                                Can to_representation() in Django Rest Framework access the normal fields
                            
                                How to use tzutc()
                            
                                f-strings giving SyntaxError?
                            
                                How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK
                            
                                Python: subprocess.call, stdout to file, stderr to file, display stderr on screen in real time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combining hdf5 files

Tags:

python

hdf5

h5py

Bitwise

People also ask

1 Answers

Yossarian

Recent Activity

Donate For Us