Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Append data to HDF5 file with Pandas, Python

I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file.

The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute.

Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data.

HDFStore way:

#we open the hdf5 file
save_hdf = HDFStore('test.h5') 

ohlcv_candle.to_hdf('test.h5')

#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table',  data_columns=True)

#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])    

The other way I have tried it, to_hdf:

#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')

#again just printing to check if it worked 
print(pd.read_hdf('test.h5', key='this_is_a_key'))

Here is what one of the DataFrames looks like after being read_hdf:

           time     open     high      low    close     volume           PP  
0    1505305260  3137.89  3147.15  3121.17  3146.94   6.205397  3138.420000   
1    1505305320  3146.86  3159.99  3130.00  3159.88   8.935962  3149.956667   
2    1505305380  3159.96  3160.00  3159.37  3159.66   4.524017  3159.676667   
3    1505305440  3159.66  3175.51  3151.08  3175.51   8.717610  3167.366667   
4    1505305500  3175.25  3175.53  3170.44  3175.53   3.187453  3173.833333  

The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. If there is a better way of solving this, do not be shy to recommend it.

P.S. sorry for the formatting of that table in here

like image 803
Karl Avatar asked Sep 13 '17 20:09

Karl


People also ask

How do I save pandas DataFrame to HDF5?

Exporting a pandas DataFrame to a HDF5 file: The method to_hdf() exports a pandas DataFrame object to a HDF5 File. The HDF5 group under which the pandas DataFrame has to be stored is specified through the parameter key.

Can pandas read HDF5?

Pandas uses PyTables for reading and writing HDF5 files, which allows serializing object-dtype data with pickle when using the “fixed” format.

How do you append datasets in Python?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.


2 Answers

pandas.HDFStore.put() has parameter append (which defaults to False) - that instructs Pandas to overwrite instead of appending.

So try this:

store = pd.HDFStore('test.h5')

store.append('name_of_frame', ohlcv_candle, format='t',  data_columns=True)

we can also use store.put(..., append=True), but this file should also be created in a table format:

store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)

NOTE: appending works only for the table (format='t' - is an alias for format='table') format.

like image 127
MaxU - stop WAR against UA Avatar answered Oct 20 '22 12:10

MaxU - stop WAR against UA


tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')

You need to pass another argument append=True to specify that the data is to be appended to existing data if found under that key, instead of over-writing it.

Without this, the default is False and if it encounters an existing table under 'this_is_a_key' then it overwrites.

The mode= argument is only at file-level, telling whether the file as a whole is to be overwritten or appended.

One file can have any number of keys, so a mode='a', append=False setting will mean only one key gets over-written while the other keys stay.

I had a similar experience as yours and found the additional append argument in the reference doc. After setting it, now it's appending properly for me.

Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

Note: hdf5 won't bother doing anything with the dataframe's indexes. We need to iron those out before putting the data in or when we take it out.

like image 20
Nikhil VJ Avatar answered Oct 20 '22 12:10

Nikhil VJ