Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a Pandas categorical, what is format="table"?

The HDF5 format apparently does not support categoricals with format="fixed". The following example

s = pd.Series(['a','b','a','b'],dtype='category')
s.to_hdf('s.h5','s')

Returns the error:

NotImplementedError: Cannot store a category dtype in a HDF5 dataset that uses format="fixed". Use format="table".

How do I construct the categorical series with format='table'?

like image 980
Autumn Avatar asked May 04 '18 00:05

Autumn


1 Answers

Specify format='table' or format='t' in pd.Series.to_hdf:

s.to_hdf('s.h5', key='s', format='t')

Note that this is also what the error message advises. As per the docs:

format : ‘fixed(f)|table(t)’, default is ‘fixed’

fixed(f) : Fixed format Fast writing/reading. Not-appendable, nor searchable

table(t) : Table format Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data

like image 195
jpp Avatar answered Nov 15 '22 07:11

jpp