Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding or removing specific rows or columns in an h5py dataset

Tags:

python

hdf5

h5py

Once you create an h5py dataset, how do you add or remove specific rows or columns from an NxM array?

My question is similar to this one, but I don't want to blindly truncate or expand the array. When removing, I need to be able to specify the exact row or column to remove.

For adding, I know I have to specify maxshape=(None, None) when creating the initial dataset, but the resize method doesn't seem to let you specify which rows or columns get truncated if you shrink the size.

like image 709
Cerin Avatar asked Apr 22 '14 18:04

Cerin


1 Answers

h5py isn't really designed for doing this. Pandas might be a better library to use, as it's built around the concept of tables.

Having said that, here's how to do it:

In [1]: f = h5py.File('test.h5')

In [2]: arr = rand(4,4)

In [3]: dset = f.create_dataset('foo',data=arr,maxshape=(2000,2000))

In [4]: dset[:]
Out[4]:
array([[ 0.29732874,  0.59310285,  0.61116263,  0.79950116],
       [ 0.4194363 ,  0.4691813 ,  0.95648712,  0.56120731],
       [ 0.76868585,  0.07556214,  0.39854704,  0.73415885],
       [ 0.0919063 ,  0.0420656 ,  0.35082375,  0.62565894]])

In [5]: dset[1:-1,:] = dset[2:,:]

In [6]: dset.resize((3,4))

In [7]: dset[:]
Out[7]:
array([[ 0.29732874,  0.59310285,  0.61116263,  0.79950116],
       [ 0.76868585,  0.07556214,  0.39854704,  0.73415885],
       [ 0.0919063 ,  0.0420656 ,  0.35082375,  0.62565894]])

This removes column 1 from dset. It does so by assigning columns 2 and 3 to 1 and 2, respectively, before shrinking the dataset by one column. Swap the subscripts to remove row 1. You can easily write a wrapper around this if you're going to be doing it a lot.

like image 84
Yossarian Avatar answered Sep 17 '22 16:09

Yossarian