The HDF5 format apparently does not support categoricals with format="fixed". The following example
s = pd.Series(['a','b','a','b'],dtype='category')
s.to_hdf('s.h5','s')
Returns the error:
NotImplementedError: Cannot store a category dtype in a HDF5 dataset that uses format="fixed". Use format="table".
How do I construct the categorical series with format='table'?
Specify format='table'
or format='t'
in pd.Series.to_hdf
:
s.to_hdf('s.h5', key='s', format='t')
Note that this is also what the error message advises. As per the docs:
format : ‘fixed(f)|table(t)’, default is ‘fixed’
fixed(f) : Fixed format Fast writing/reading. Not-appendable, nor searchable
table(t) : Table format Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With