Append data to HDF5 file with Pandas, Python

Tags:

I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file.

The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute.

Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data.

HDFStore way:

#we open the hdf5 file
save_hdf = HDFStore('test.h5') 

ohlcv_candle.to_hdf('test.h5')

#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table',  data_columns=True)

#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])

The other way I have tried it, to_hdf:

#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')

#again just printing to check if it worked 
print(pd.read_hdf('test.h5', key='this_is_a_key'))

Here is what one of the DataFrames looks like after being read_hdf:

           time     open     high      low    close     volume           PP  
0    1505305260  3137.89  3147.15  3121.17  3146.94   6.205397  3138.420000   
1    1505305320  3146.86  3159.99  3130.00  3159.88   8.935962  3149.956667   
2    1505305380  3159.96  3160.00  3159.37  3159.66   4.524017  3159.676667   
3    1505305440  3159.66  3175.51  3151.08  3175.51   8.717610  3167.366667   
4    1505305500  3175.25  3175.53  3170.44  3175.53   3.187453  3173.833333

The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. If there is a better way of solving this, do not be shy to recommend it.

P.S. sorry for the formatting of that table in here

803

asked Sep 13 '17 20:09

Karl

2 Answers

pandas.HDFStore.put() has parameter append (which defaults to False) - that instructs Pandas to overwrite instead of appending.

So try this:

store = pd.HDFStore('test.h5')

store.append('name_of_frame', ohlcv_candle, format='t',  data_columns=True)

we can also use store.put(..., append=True), but this file should also be created in a table format:

store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)

NOTE: appending works only for the table (format='t' - is an alias for format='table') format.

127

answered Oct 20 '22 12:10

MaxU - stop WAR against UA

tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')

You need to pass another argument append=True to specify that the data is to be appended to existing data if found under that key, instead of over-writing it.

Without this, the default is False and if it encounters an existing table under 'this_is_a_key' then it overwrites.

The mode= argument is only at file-level, telling whether the file as a whole is to be overwritten or appended.

One file can have any number of keys, so a mode='a', append=False setting will mean only one key gets over-written while the other keys stay.

I had a similar experience as yours and found the additional append argument in the reference doc. After setting it, now it's appending properly for me.

Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

Note: hdf5 won't bother doing anything with the dataframe's indexes. We need to iron those out before putting the data in or when we take it out.

answered Oct 20 '22 12:10

Nikhil VJ

Related questions
                            
                                Iterate over numpy with index (numpy equivalent of python enumerate)
                            
                                seaborn heatmap color scheme based on row values
                            
                                Why is checking isinstance(something, Mapping) so slow?
                            
                                Python: How to add specific columns of .mean to dataframe
                            
                                Renaming tuple column name in dataframe
                            
                                Implementing 3D vectors in Python: numpy vs x,y,z fields
                            
                                Flatten hierarchically indexed pandas.DataFrame from groupby and multiple aggregation
                            
                                python inserts pictures to powerpoint, how to set the width and height of the picture?
                            
                                Plotting heatmap for 3 columns in python with seaborn
                            
                                How to locate and read Data Matrix code with python
                            
                                python astype(str) gives SettingWithCopyWarning and requests I use loc
                            
                                sqlalchemy.exc.UnboundExecutionError: Table object 'responsibles' is not bound to an Engine or Connection
                            
                                Dynamically generating elements of list within list
                            
                                Python Pandas Match Vlookup columns based on header values
                            
                                NumPy sum one array based on values in another array for each matching element in 3rd array
                            
                                Get *all* current jobs from python-rq
                            
                                Pythonic way to use the second condition in list comprehensions
                            
                                PyCharm: Unresolved reference with Scapy
                            
                                'DataFrame' object has no attribute 'melt'
                            
                                OpenCV image subtraction vs Numpy subtraction

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Append data to HDF5 file with Pandas, Python

Tags:

python

pandas

dataframe

hdf5

Karl

People also ask

2 Answers

MaxU - stop WAR against UA

Nikhil VJ

Recent Activity

Donate For Us