Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How convert this type of data <hdf5 object reference> to something more readable in the python?

I have quite big dataset. All information stored in the hdf5 format file. I found h5py library for python. All works properly except of the

[<HDF5 object reference>]

I have no idea how to convert it in something more readable. Can I do it at all ? Because documentation in this question slightly hard for me. Maybe there are some others solutions with different languages not only Python. I appreciate every help I will get.

In the ideal it should be link to the file.

It's the part of my code:

import numpy as np
import h5py 
import time

f = h5py.File('myfile1.mat','r') 
#print f.keys()
test = f['db/path']
st = test[3]
print(  st )

st output is [<HDF5 object reference>]

test output is <HDF5 dataset "path": shape (73583, 1), type "|O8">

And I expect instead [<HDF5 object reference>] something like that one: /home/directory/file1.jpg. If it is possible of course.

like image 422
Dmytro Chasovskyi Avatar asked Feb 16 '15 12:02

Dmytro Chasovskyi


People also ask

What is HDF5 object reference?

HDF5 references are low-level pointers to other objects. The great advantage of references is that they can be stored and retrieved as data; you can create an attribute or an entire dataset of reference type. References come in two flavors, object references and region references.

How do you write HDF5 in Python?

Creating HDF5 filesThe first step to creating a HDF5 file is to initialise it. It uses a very similar syntax to initialising a typical text file in numpy. The first argument provides the filename and location, the second the mode. We're writing the file, so we provide a w for write access.

How do I open a H5 file in Python?

To use HDF5, numpy needs to be imported. One important feature is that it can attach metaset to every data in the file thus provides powerful searching and accessing. Let's get started with installing HDF5 to the computer. As HDF5 works on numpy, we would need numpy installed in our machine too.


2 Answers

My friend answered my question and I understood how it was easy. But I spent more than 4 hours solving my small problem. The solution is:

import numpy as np
import h5py 
import time

f = h5py.File('myfile1.mat','r') 
test = f['db/path']
st = test[0][0]
obj = f[st]
str1 = ''.join(chr(i) for i in obj[:])
print( str1 )

I'm sorry if don't specified my problem accurately. But this the solution I tried to find.

like image 98
Dmytro Chasovskyi Avatar answered Sep 19 '22 11:09

Dmytro Chasovskyi


You can define your own __str__() or __repr__() method for this class, or create a simple wrapper which formats a string with the information you want to see. Based on quick browsing of the documentation, you could do something like

from h5py import File

class MyHDF5File (File):
    def __repr__ (self):
        return '<HDF5File({0})>'.format(self.filename)
like image 20
tripleee Avatar answered Sep 20 '22 11:09

tripleee