Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving with h5py arrays of different sizes

I am trying to store about 3000 numpy arrays using HDF5 data format. Arrays vary in length from 5306 to 121999 np.float64

I am getting Object dtype dtype('O') has no native HDF5 equivalent error since due to the irregular nature of the data numpy uses the general object class.

My idea was to pad all the arrays to 121999 length and storing the sizes in another dataset.

However this seems quite inefficient in space, is there a better way?

EDIT: To clarify, I want to store 3126 arrays of dtype = np.float64. I have them stored in a listand when h5py does the routine it converts to an array of dtype = object because they are different lengths. To illustrate it:

a = np.array([0.1,0.2,0.3],dtype=np.float64)
b = np.array([0.1,0.2,0.3,0.4,0.5],dtype=np.float64)
c = np.array([0.1,0.2],dtype=np.float64)

arrs = np.array([a,b,c]) # This is performed inside the h5py call
print(arrs.dtype)
>>> object
print(arrs[0].dtype)
>>> float64
like image 954
Jose Javier Gonzalez Ortiz Avatar asked May 13 '16 15:05

Jose Javier Gonzalez Ortiz


2 Answers

Looks like you tried something like:

In [364]: f=h5py.File('test.hdf5','w')    
In [365]: grp=f.create_group('alist')

In [366]: grp.create_dataset('alist',data=[a,b,c])
...
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

But if instead you save the arrays as separate datasets it works:

In [367]: adict=dict(a=a,b=b,c=c)

In [368]: for k,v in adict.items():
    grp.create_dataset(k,data=v)
   .....:     

In [369]: grp
Out[369]: <HDF5 group "/alist" (3 members)>

In [370]: grp['a'][:]
Out[370]: array([ 0.1,  0.2,  0.3])

and to access all the datasets in the group:

In [389]: [i[:] for i in grp.values()]
Out[389]: 
[array([ 0.1,  0.2,  0.3]),
 array([ 0.1,  0.2,  0.3,  0.4,  0.5]),
 array([ 0.1,  0.2])]
like image 77
hpaulj Avatar answered Oct 12 '22 23:10

hpaulj


Clean method for variable length internal arrays: http://docs.h5py.org/en/latest/special.html?highlight=dtype#arbitrary-vlen-data

hdf5_file = h5py.File('yourdataset.hdf5', mode='w')
dt = h5py.special_dtype(vlen=np.dtype('float64'))
hdf5_file.create_dataset('dataset', (3,), dtype=dt)
hdf5_file['dataset'][...] = arrs

print (hdf5_file['dataset'][...])
>>>array([array([0.1,0.2,0.3],dtype=np.float64), 
>>>array([0.1,0.2,0.3,0.4,0.5],dtype=np.float64, 
>>>array([0.1,0.2],dtype=np.float64], dtype=object)

Only works for 1D arrays, https://github.com/h5py/h5py/issues/876

like image 41
Joshua Lim Avatar answered Oct 13 '22 00:10

Joshua Lim