I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file.
The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute.
Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data.
HDFStore way:
#we open the hdf5 file
save_hdf = HDFStore('test.h5')
ohlcv_candle.to_hdf('test.h5')
#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table', data_columns=True)
#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])
The other way I have tried it, to_hdf:
#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')
#again just printing to check if it worked
print(pd.read_hdf('test.h5', key='this_is_a_key'))
Here is what one of the DataFrames looks like after being read_hdf:
time open high low close volume PP
0 1505305260 3137.89 3147.15 3121.17 3146.94 6.205397 3138.420000
1 1505305320 3146.86 3159.99 3130.00 3159.88 8.935962 3149.956667
2 1505305380 3159.96 3160.00 3159.37 3159.66 4.524017 3159.676667
3 1505305440 3159.66 3175.51 3151.08 3175.51 8.717610 3167.366667
4 1505305500 3175.25 3175.53 3170.44 3175.53 3.187453 3173.833333
The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. If there is a better way of solving this, do not be shy to recommend it.
P.S. sorry for the formatting of that table in here
Exporting a pandas DataFrame to a HDF5 file: The method to_hdf() exports a pandas DataFrame object to a HDF5 File. The HDF5 group under which the pandas DataFrame has to be stored is specified through the parameter key.
Pandas uses PyTables for reading and writing HDF5 files, which allows serializing object-dtype data with pickle when using the “fixed” format.
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.
pandas.HDFStore.put() has parameter append
(which defaults to False
) - that instructs Pandas to overwrite instead of appending.
So try this:
store = pd.HDFStore('test.h5')
store.append('name_of_frame', ohlcv_candle, format='t', data_columns=True)
we can also use store.put(..., append=True)
, but this file should also be created in a table format:
store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)
NOTE: appending works only for the table
(format='t'
- is an alias for format='table'
) format.
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')
You need to pass another argument append=True
to specify that the data is to be appended to existing data if found under that key, instead of over-writing it.
Without this, the default is False
and if it encounters an existing table under 'this_is_a_key'
then it overwrites.
The mode=
argument is only at file-level, telling whether the file as a whole is to be overwritten or appended.
One file can have any number of keys, so a mode='a', append=False
setting will mean only one key gets over-written while the other keys stay.
I had a similar experience as yours and found the additional append argument in the reference doc. After setting it, now it's appending properly for me.
Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html
Note: hdf5 won't bother doing anything with the dataframe's indexes. We need to iron those out before putting the data in or when we take it out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With