Fast selection of a time interval in a pandas DataFrame/Series

Tags:

my problem is that I want to filter a DataFrame to only include times within the interval [start, end) . If do not care about the day, I would like to filter only for start and end time for each day. I have a solution for this but it is slow. So my question is if there is a faster way to do the time based filtering.

Example

import pandas as pd
import time


index=pd.date_range(start='2012-11-05 01:00:00', end='2012-11-05 23:00:00', freq='1S').tz_localize('UTC')
df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])

# select from 1 to 2 am, include day
now=time.time()
df2=df.ix['2012-11-05 01:00:00':'2012-11-05 02:00:00']
print 'Took %s seconds' %(time.time()-now) #0.0368609428406

# select from 1 to 2 am, for every day
now=time.time()
selector=(df.index.hour>=1) & (df.index.hour<2)
df3=df[selector]
print 'Took %s seconds' %(time.time()-now) #Took  0.0699911117554

As you can see if I remove the day (second case) it takes almost twice as much. The computation time increases rapidly if I have a number of different days, e.g from 5 to 7 Nov:

index=pd.date_range(start='2012-11-05 01:00:00', end='2012-11-07 23:00:00', freq='1S').tz_localize('UTC')

So, to summarize is there a faster method to filter by time of the day, across many days?

Thx

890

asked Feb 02 '14 14:02

Mannaggia

1 Answers

You need between_time method.

In [14]: %timeit df.between_time(start_time='01:00', end_time='02:00')
100 loops, best of 3: 10.2 ms per loop

In [15]: %timeit selector=(df.index.hour>=1) & (df.index.hour<2); df[selector]
100 loops, best of 3: 18.2 ms per loop

I had done these tests with 5th to 7th November as index.

Documentation

Definition: df.between_time(self, start_time, end_time, include_start=True, include_end=True)
Docstring:
Select values between particular times of the day (e.g., 9:00-9:30 AM)

Parameters
----------
start_time : datetime.time or string
end_time : datetime.time or string
include_start : boolean, default True
include_end : boolean, default True

Returns
-------
values_between_time : type of caller

103

answered Sep 25 '22 03:09

Nipun Batra

Related questions
                            
                                Setting style in using QStyleFactory from a list of styles in a QComboBox
                            
                                How to dump and read json / pickle files into Google Drive through the python API?
                            
                                Bundling GTK3+ with py2exe
                            
                                python multiprocessing freezing
                            
                                How to plot f(x) as x goes to infinity with matplotlib?
                            
                                Kivy refresh layout (Async Loading)
                            
                                Iteration over lines in a text file, returning line numbers and occurrences?
                            
                                Why two individually created immutable objects have same id and mutable objects have different while both refer to same values? [duplicate]
                            
                                Duplicating some rows and changing some values in pandas
                            
                                ZeroMQ: HWM on PUSH does not work
                            
                                number to text in django
                            
                                How to batch delete records using PyMongo
                            
                                Pandas: getting the name of the minimum column
                            
                                Python difference between __import__ and import as
                            
                                Python alternative to fscanf C code
                            
                                How to make Python choose between two options?
                            
                                Fast way to select n items (drawn from a Poisson distribution) for each element in array x
                            
                                Autobahn cannot import name error
                            
                                Django sum of row in template for loop
                            
                                OperationalError: No such Column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast selection of a time interval in a pandas DataFrame/Series

Tags:

python

indexing

pandas

Mannaggia

People also ask

1 Answers

Documentation

Nipun Batra

Recent Activity

Donate For Us