Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

I have the following pandas dataframe:

import pandas as pd
df = pd.read_csv(filename.csv)

Now, I can use HDFStore to write the df object to file (like adding key-value pairs to a Python dictionary):

store = HDFStore('store.h5')
store['df'] = df

http://pandas.pydata.org/pandas-docs/stable/io.html

When I look at the contents, this object is a frame.

store 

outputs

<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[552,23252])

However, in order to use indexing, one should store this as a table object.

My approach was to try HDFStore.put() i.e.

HDFStore.put(key="store.h", value=df, format=Table)

However, this fails with the error:

TypeError: put() missing 1 required positional argument: 'self'

How does one save Pandas Dataframes as PyTables tables?

like image 323
JianguoHisiang Avatar asked Mar 11 '23 16:03

JianguoHisiang


1 Answers

common part - create or open existing HDFStore file:

store = pd.HDFStore('store.h5')

Try this if you want to have indexed all columns:

store.append('key_name', df, data_columns=True)

or this if you want to have indexed just a subset of columns:

store.append('key_name', df, data_columns=['colA','colC','colN'])

PS HDFStore.append() saves DFs per default in table format

like image 65
MaxU - stop WAR against UA Avatar answered Apr 06 '23 22:04

MaxU - stop WAR against UA