Once you create an h5py dataset, how do you add or remove specific rows or columns from an NxM array?
My question is similar to this one, but I don't want to blindly truncate or expand the array. When removing, I need to be able to specify the exact row or column to remove.
For adding, I know I have to specify maxshape=(None, None)
when creating the initial dataset, but the resize
method doesn't seem to let you specify which rows or columns get truncated if you shrink the size.
h5py isn't really designed for doing this. Pandas might be a better library to use, as it's built around the concept of tables.
Having said that, here's how to do it:
In [1]: f = h5py.File('test.h5')
In [2]: arr = rand(4,4)
In [3]: dset = f.create_dataset('foo',data=arr,maxshape=(2000,2000))
In [4]: dset[:]
Out[4]:
array([[ 0.29732874, 0.59310285, 0.61116263, 0.79950116],
[ 0.4194363 , 0.4691813 , 0.95648712, 0.56120731],
[ 0.76868585, 0.07556214, 0.39854704, 0.73415885],
[ 0.0919063 , 0.0420656 , 0.35082375, 0.62565894]])
In [5]: dset[1:-1,:] = dset[2:,:]
In [6]: dset.resize((3,4))
In [7]: dset[:]
Out[7]:
array([[ 0.29732874, 0.59310285, 0.61116263, 0.79950116],
[ 0.76868585, 0.07556214, 0.39854704, 0.73415885],
[ 0.0919063 , 0.0420656 , 0.35082375, 0.62565894]])
This removes column 1 from dset
. It does so by assigning columns 2 and 3 to 1 and 2, respectively, before shrinking the dataset by one column. Swap the subscripts to remove row 1. You can easily write a wrapper around this if you're going to be doing it a lot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With