Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new Timestamp is available), replace this with new values from a given pandas DataFrame and append this to the Pytable. Basically, just update a Pytable. I can get the combined DataFrame using the combine_first method in Pandas. Below the Pytable is created with dummy data:

import pandas as pd
import numpy as np
import datetime as dt
index = pd.DatetimeIndex(start = dt.datetime(2001,1,1,0,0), periods = 20000,freq='10T')
data_in_pytable = pd.DataFrame(index=index,data=np.random.randn(20000,2),columns=['value_1','value_2'])
data.to_hdf(r'C:\pytable.h5','test',mode='r+',append=True,complevel=9,complib='zlib')

So the pytable is created. Assuming I have another dataFrame with which I want to update the Pytable with:

new_index = pd.DatetimeIndex(start = dt.datetime(2001,5,1,0,0), periods = 10000,freq='10T')
data_to_update=pd.DataFrame(index=new_index,data=np.random.randn(10000,2),columns=['value_1','value_2'])
store=pd.HDFStore(r'C:\pytable.h5',mode='r+',complevel=9,complib='zlib')
store.append('test',store.select('test').combine_first(data_to_update))
store.close()

The problem is that the PyTable keeps the original values, does not update the existing ones. I now have duplicate entries (by index) because the original values are not overwritten.

Summary: How can I update a PyTable with another DataFrame?

Thanks, Elv

like image 924
Elvin Avatar asked Jun 10 '13 09:06

Elvin


2 Answers

This is not supported currently. PyTables does support an update method, but is not implemented in pandas.

Easiest method is to either use mode='w' and write a new file or

store.remove(key); store.append(.....)

HDF5 is not a 'regular' database and updating is not a common operation, if you need that SQL might be an option.

Feel free to request update as an enhancement as an issue.

like image 130
Jeff Avatar answered Nov 14 '22 23:11

Jeff


In the end, I found it out myself. In my case, when it is okay to overwrite the entire Node as the "combine_first" gets you the original and the new values', it is fine to use

store.put(key,value,table=True,append=False) 

instead of the

store.append(key,value).
like image 21
Elvin Avatar answered Nov 15 '22 00:11

Elvin