Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe Mask based on index

I have the following dataframe:

import pandas as pd
index = pd.date_range('2013-1-1',periods=10,freq='15Min')
data = pd.DataFrame(data=[1,2,3,4,5,6,7,8,9,0], columns=['value'], index=index)

How can I generate a mask based on the index value? I know I can do something like:

data['value'] > 3
Out[40]: 
2013-01-01 00:00:00    False
2013-01-01 00:15:00    False
2013-01-01 00:30:00    False
2013-01-01 00:45:00     True
2013-01-01 01:00:00     True
2013-01-01 01:15:00     True
2013-01-01 01:30:00     True
2013-01-01 01:45:00     True
2013-01-01 02:00:00     True
2013-01-01 02:15:00    False
Freq: 15T, Name: value, dtype: bool

I want to generate a mask to only consider some rows where the index is in a certain range. I was thinking of doing something like data['index'].time() > datetime.time(1,15) to generate a mask. Except of course data['index'] fails because index is not the name of a column. How can you reference the index value for a row in a mask?

like image 822
BrandonAGr Avatar asked Jul 09 '13 23:07

BrandonAGr


1 Answers

You can mask using indexer_between_time:

In [11]: data.index.indexer_between_time(start='01:15', end='02:00')
Out[11]: array([5, 6, 7, 8])

In [12]: data.iloc[data.index.indexer_between_time(start='1:15', end='02:00')]
Out[12]:
                     value
2013-01-01 01:15:00      6
2013-01-01 01:30:00      7
2013-01-01 01:45:00      8
2013-01-01 02:00:00      9

As you can see, you access the index by the attribute .index.

Note: indexer_between_time by default both include_start and include_end are True, it also offers a tz argument to compare the time to a different timezone.

like image 100
Andy Hayden Avatar answered Oct 14 '22 07:10

Andy Hayden