Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get last row in pandas HDF5 query

I am trying to get the index of the last row of a pandas dataframe stored in HDF5 without having to pull the whole dataset or index into memory. I am looking for something like this:

from pandas import HDFStore

store = HDFStore('file.h5')

last_index = store.select('dataset', where='index == -1').index

Except in my case the last index won't be -1 but a Timestamp

like image 595
baconwichsand Avatar asked Dec 19 '22 03:12

baconwichsand


1 Answers

Use the start= and stop= arguments which work like positional indexers

In [8]: df = DataFrame({'A' : np.random.randn(10000)},index=pd.date_range('20130101',periods=10000,freq='s'))

In [9]: store = pd.HDFStore('test.h5',mode='w')

In [10]: store.append('df',df)

In [11]: nrows = store.get_storer('df').nrows

In [12]: nrows
Out[12]: 10000

In [13]: store.select('df',start=nrows-1,stop=nrows)
Out[13]: 
                            A
2013-01-01 02:46:39 -0.890721

In [15]: df.iloc[[-1]]
Out[15]: 
                            A
2013-01-01 02:46:39 -0.890721
like image 137
Jeff Avatar answered Dec 21 '22 17:12

Jeff