Lots of information on how to read a csv into a pandas dataframe, but I what I have is a pyTable table and want a pandas DataFrame.
I've found how to store my pandas DataFrame to pytables... then read I want to read it back, at this point it will have:
"kind = v._v_attrs.pandas_type"
I could write it out as csv and re-read it in but that seems silly. It is what I am doing for now.
How should I be reading pytable objects into pandas?
import tables as pt
import pandas as pd
import numpy as np
# the content is junk but we don't care
grades = np.empty((10,2), dtype=(('name', 'S20'), ('grade', 'u2')))
# write to a PyTables table
handle = pt.openFile('/tmp/test_pandas.h5', 'w')
handle.createTable('/', 'grades', grades)
print handle.root.grades[:].dtype # it is a structured array
# load back as a DataFrame and check types
df = pd.DataFrame.from_records(handle.root.grades[:])
df.dtypes
Beware that your u2 (unsigned 2-byte integer) will end as an i8 (integer 8 byte), and the strings will be objects, because Pandas does not yet support the full range of dtypes that are available for Numpy arrays.
The docs now include an excellent section on using the HDF5 store and there are some more advanced strategies discussed in the cookbook.
It's now relatively straightforward:
In [1]: store = HDFStore('store.h5')
In [2]: print store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty
In [3]: df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [4]: store['df'] = df
In [5]: store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df frame (shape->[2,2])
And to retrieve from HDF5/pytables:
In [6]: store['df'] # store.get('df') is an equivalent
Out[6]:
A B
0 1 2
1 3 4
You can also query within a table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With