Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: selecting rows in a specific time window

I have a dataset of samples covering multiple days, all with a timestamp. I want to select rows within a specific time window. E.g. all rows that were generated between 1pm and 3 pm every day.

This is a sample of my data in a pandas dataframe:

22           22  2018-04-12T20:14:23Z  2018-04-12T21:14:23Z      0  6370.1   
23           23  2018-04-12T21:14:23Z  2018-04-12T21:14:23Z      0  6368.8   
24           24  2018-04-12T22:14:22Z  2018-04-13T01:14:23Z      0  6367.4   
25           25  2018-04-12T23:14:22Z  2018-04-13T01:14:23Z      0  6365.8   
26           26  2018-04-13T00:14:22Z  2018-04-13T01:14:23Z      0  6364.4   
27           27  2018-04-13T01:14:22Z  2018-04-13T01:14:23Z      0  6362.7   
28           28  2018-04-13T02:14:22Z  2018-04-13T05:14:22Z      0  6361.0   
29           29  2018-04-13T03:14:22Z  2018-04-13T05:14:22Z      0  6359.3   
..          ...                   ...                   ...    ...     ...   
562         562  2018-05-05T08:13:21Z  2018-05-05T09:13:21Z      0  6300.9   
563         563  2018-05-05T09:13:21Z  2018-05-05T09:13:21Z      0  6300.7   
564         564  2018-05-05T10:13:14Z  2018-05-05T13:13:14Z      0  6300.2   
565         565  2018-05-05T11:13:14Z  2018-05-05T13:13:14Z      0  6299.9   
566         566  2018-05-05T12:13:14Z  2018-05-05T13:13:14Z      0  6299.6   

How do I achieve that? I need to ignore the date and just evaluate the time component. I could traverse the dataframe in a loop and evaluate the date time in that way, but there must be a more simple way to do that..

I converted the messageDate which was read a a string to a dateTime by

df["messageDate"]=pd.to_datetime(df["messageDate"])

But after that I got stuck on how to filter on time only.

Any input appreciated.

like image 741
Hans van Schaick Avatar asked May 09 '18 09:05

Hans van Schaick


People also ask

How do you select a row by date in python?

Using a DatetimeIndex: Then you can select rows by date using df. loc[start_date:end_date] .

Is Pandas query faster than LOC?

The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.


1 Answers

datetime columns have DatetimeProperties object, from which you can extract datetime.time and filter on it:

import datetime

df = pd.DataFrame(
    [
        '2018-04-12T12:00:00Z', '2018-04-12T14:00:00Z','2018-04-12T20:00:00Z',
        '2018-04-13T12:00:00Z', '2018-04-13T14:00:00Z', '2018-04-13T20:00:00Z'
    ], 
    columns=['messageDate']
)
df
            messageDate
# 0 2018-04-12 12:00:00
# 1 2018-04-12 14:00:00
# 2 2018-04-12 20:00:00
# 3 2018-04-13 12:00:00
# 4 2018-04-13 14:00:00
# 5 2018-04-13 20:00:00

df["messageDate"] = pd.to_datetime(df["messageDate"])
time_mask = (df['messageDate'].dt.hour >= 13) & \
            (df['messageDate'].dt.hour <= 15)

df[time_mask]
#           messageDate
# 1 2018-04-12 14:00:00
# 4 2018-04-13 14:00:00
like image 69
Grigoriy Mikhalkin Avatar answered Nov 07 '22 19:11

Grigoriy Mikhalkin