Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytables table into pandas DataFrame

Lots of information on how to read a csv into a pandas dataframe, but I what I have is a pyTable table and want a pandas DataFrame.

I've found how to store my pandas DataFrame to pytables... then read I want to read it back, at this point it will have:

"kind = v._v_attrs.pandas_type"  

I could write it out as csv and re-read it in but that seems silly. It is what I am doing for now.

How should I be reading pytable objects into pandas?

like image 761
Jim Knoll Avatar asked Oct 16 '12 22:10

Jim Knoll


2 Answers

import tables as pt
import pandas as pd
import numpy as np

# the content is junk but we don't care
grades = np.empty((10,2), dtype=(('name', 'S20'), ('grade', 'u2')))

# write to a PyTables table
handle = pt.openFile('/tmp/test_pandas.h5', 'w')
handle.createTable('/', 'grades', grades)
print handle.root.grades[:].dtype # it is a structured array

# load back as a DataFrame and check types
df = pd.DataFrame.from_records(handle.root.grades[:])
df.dtypes

Beware that your u2 (unsigned 2-byte integer) will end as an i8 (integer 8 byte), and the strings will be objects, because Pandas does not yet support the full range of dtypes that are available for Numpy arrays.

like image 89
meteore Avatar answered Oct 08 '22 03:10

meteore


The docs now include an excellent section on using the HDF5 store and there are some more advanced strategies discussed in the cookbook.

It's now relatively straightforward:

In [1]: store = HDFStore('store.h5')

In [2]: print store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty

In [3]: df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [4]: store['df'] = df

In [5]: store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[2,2])

And to retrieve from HDF5/pytables:

In [6]: store['df']  # store.get('df') is an equivalent
Out[6]:
   A  B
0  1  2
1  3  4

You can also query within a table.

like image 34
Andy Hayden Avatar answered Oct 08 '22 01:10

Andy Hayden