If I create hdf5 file with pandas with following code:
import pandas as pd
store = pd.HDFStore("store.h5")
for x in range(1000):
store["name"+str(x)] = pd.Series()
all series are empty, so why "store.h5" file takes 1.1GB space on hardrive?
Short version: You have found a bug. Quoting this bug on GitHub:
...required a bit of a hackjob (pytables doesn't like zero-length objects)
I can reproduce this error on my machine. Simply changing your code to this:
import pandas as pd
store = pd.HDFStore("store.h5")
for x in range(1000):
store["name"+str(x)] = pd.Series([1,2])
results in a sane megabyte-scale file. I cannot find an open bug on Github; you might try reporting it.
I assume you've already dealt with the issue in your code, but if you haven't, you should probably just check to make sure that no array dimensions are zero before storing an object:
toStore=pd.Series()
assert not np.prod( toStore.shape )==0, 'Tried to store an empty object!'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With