I would like to get the 07h00 value every day, from a multiday DataFrame that has 24 hours of minute data in it each day.
import numpy as np
import pandas as pd
aframe = pd.DataFrame([np.arange(10000), np.arange(10000) * 2]).T
aframe.index = pd.date_range("2015-09-01", periods = 10000, freq = "1min")
aframe.head()
Out[174]:
0 1
2015-09-01 00:00:00 0 0
2015-09-01 00:01:00 1 2
2015-09-01 00:02:00 2 4
2015-09-01 00:03:00 3 6
2015-09-01 00:04:00 4 8
aframe.tail()
Out[175]:
0 1
2015-09-07 22:35:00 9995 19990
2015-09-07 22:36:00 9996 19992
2015-09-07 22:37:00 9997 19994
2015-09-07 22:38:00 9998 19996
2015-09-07 22:39:00 9999 19998
In this 10 000 row DataFrame spanning 7 days, how would I get the 7am value each day as efficiently as possible? Assume I might have to do this for very large tick databases so I value speed and low memory usage highly.
I know I can index with strings such as:
aframe.ix["2015-09-02 07:00:00"]
Out[176]:
0 1860
1 3720
Name: 2015-09-02 07:00:00, dtype: int64
But what I need is basically a wildcard style query for example
aframe.ix["* 07:00:00"]
You can use indexer_at_time
:
>>> locs = aframe.index.indexer_at_time('7:00:00')
>>> aframe.iloc[locs]
0 1
2015-09-01 07:00:00 420 840
2015-09-02 07:00:00 1860 3720
2015-09-03 07:00:00 3300 6600
2015-09-04 07:00:00 4740 9480
2015-09-05 07:00:00 6180 12360
2015-09-06 07:00:00 7620 15240
2015-09-07 07:00:00 9060 18120
There's also indexer_between_time
if you need select all indices that lie between two particular time of day.
Both of these methods return the integer locations of the desired values; the corresponding rows of the Series or DataFrame can be fetched with iloc
, as shown above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With