Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter/select rows of pandas dataframe by timestamp column

I am new to pandas. I have dataframe with two columns dt (date-time stamp) and value.

Given two start and end data-time stamps: is there a easy way to create a new dataframe from original one that contains rows between the two date-time stamp?

                dt    value
84    7/23/2014 7:00  0.300
85    7/23/2014 7:05  0.300
86    7/23/2014 7:10  0.312
87    7/23/2014 7:15  0.300
88    7/23/2014 7:20  0.300
89    7/23/2014 7:25  0.300
90    7/23/2014 7:30  0.300
91    7/23/2014 7:35  0.300
92    7/23/2014 7:40  0.300
93    7/23/2014 7:45  0.216
94    7/23/2014 7:50  0.204
95    7/23/2014 7:55  0.228
96    7/23/2014 8:00  0.228
97    7/23/2014 8:05  0.228
98    7/23/2014 8:10  0.228
99    7/23/2014 8:15  0.240
100   7/23/2014 8:20  0.228
101   7/23/2014 8:25  0.216
102   7/23/2014 8:30  0.228
103   7/23/2014 8:35  0.324
104   7/23/2014 8:40  0.336
105   7/23/2014 8:45  0.324
106   7/23/2014 8:50  0.324
107   7/23/2014 8:55  0.324
108   7/23/2014 9:00  0.252
109   7/23/2014 9:05  0.252
110   7/23/2014 9:10  0.240
111   7/23/2014 9:15  0.240
112   7/23/2014 9:20  0.252
113   7/23/2014 9:25  0.240
..               ...    ...
198  7/23/2014 16:30  0.132
199  7/23/2014 16:35  0.120
200  7/23/2014 16:40  0.180
201  7/23/2014 16:45  0.216
202  7/23/2014 16:50  0.204
203  7/23/2014 16:55  0.192
like image 262
bajie88 Avatar asked Apr 14 '15 11:04

bajie88


3 Answers

So long as dt is a datetime dtype already you can filter using date strings, if not then you can convert doing this:

df['dt'] = pd.to_datetime(df['dt'])

Then filter:

In [115]:

df[(df['dt'] > '2014-07-23 07:30:00') & (df['dt'] < '2014-07-23 09:00:00')]
Out[115]:
                       dt  value
index                           
91    2014-07-23 07:35:00  0.300
92    2014-07-23 07:40:00  0.300
93    2014-07-23 07:45:00  0.216
94    2014-07-23 07:50:00  0.204
95    2014-07-23 07:55:00  0.228
96    2014-07-23 08:00:00  0.228
97    2014-07-23 08:05:00  0.228
98    2014-07-23 08:10:00  0.228
99    2014-07-23 08:15:00  0.240
100   2014-07-23 08:20:00  0.228
101   2014-07-23 08:25:00  0.216
102   2014-07-23 08:30:00  0.228
103   2014-07-23 08:35:00  0.324
104   2014-07-23 08:40:00  0.336
105   2014-07-23 08:45:00  0.324
106   2014-07-23 08:50:00  0.324
107   2014-07-23 08:55:00  0.324
like image 189
EdChum Avatar answered Nov 18 '22 12:11

EdChum


You can also use query:

In [25]: df.query('"2014-07-23 07:55:00" <= dt <= "2014-07-23 08:20:00"')
Out[25]: 
                     dt  value
95  2014-07-23 07:55:00  0.228
96  2014-07-23 08:00:00  0.228
97  2014-07-23 08:05:00  0.228
98  2014-07-23 08:10:00  0.228
99  2014-07-23 08:15:00  0.240
100 2014-07-23 08:20:00  0.228
like image 4
rachwa Avatar answered Nov 18 '22 14:11

rachwa


The answer above is right, but for people who just like me stumble upon this question more than 5 years after it was posted I want to add this remark.

If you want to filter on a sorted column (and timestamps tend to be like one) it is more efficient to use the searchsorted function of pandas Series to reach O(log(n)) complexity instead of O(n).

The example below gives as a result in a difference of much more than a factor 1000. This difference can be made arbitrarily large due to the difference in complexity off course, but the chosen numbers are the ones I was using when I stumbled upon this question.

import pandas as pd
import numpy as np
import timeit

N = 500000
M = 200

data = np.hstack([np.arange(0.,N).reshape(N,1),np.random.randn(N,M-1)])
df = pd.DataFrame(data,columns=["column"+str(i) for i in range(M)])

def return_first(df):
    return df[(df['column0'] > 100.5) & (df['column0'] < 400000.5)]

def return_second(df):
    t1 = df['column0'].searchsorted(100.5)
    t2 = df['column0'].searchsorted(400000.5)
    return df.loc[t1:t2-1]


if __name__ == '__main__':
    t = timeit.timeit(lambda: return_first(df), number=100)
    print(t)
    t = timeit.timeit(lambda: return_second(df), number=100)
    print(t)

results:

59.1751627
0.015401400000001786
like image 6
DwightFromTheOffice Avatar answered Nov 18 '22 14:11

DwightFromTheOffice