Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find 5 minutes gaps in a Pandas dataframe?

Tags:

python

pandas

I have pandas dataframe that looks similar to this:

                     TIMESTAMP  EVENT_COUNT
0          2014-07-23 04:28:23            1 
1          2014-07-23 04:28:24            1
2   2014-07-23 04:28:25.999000            4
3          2014-07-23 04:28:27            1
4   2014-07-23 04:28:28.999000            2
5          2014-07-23 04:28:30            1
6          2014-07-23 04:29:31            7
7          2014-07-23 04:29:33            1
8          2014-07-23 04:29:34            1
9          2014-07-23 04:29:36            1
10         2014-07-23 04:40:37            2
11         2014-07-23 04:40:39            1
12         2014-07-23 04:40:40            1
13         2014-07-23 04:40:42            1
14         2014-07-23 04:40:43            1
15  2014-07-23 04:40:44.999000            4
16         2014-07-23 04:41:46            1
17         2014-07-23 04:41:47            1
18         2014-07-23 04:41:49            1
19         2014-07-23 04:41:50            1
20         2014-07-23 04:50:52            9
21         2014-07-23 04:50:53            4
22         2014-07-23 04:50:55            6
23         2014-07-27 01:12:13            1

My end goal is to be able to find gaps in this that exceed 5 minutes. So, from above, I'd find a gap between:

2014-07-23 04:29:36 and 2014-07-23 04:40:37
2014-07-23 04:41:50 and 2014-07-23 04:50:52
2014-07-23 04:50:55 and 2014-07-27 01:12:13

Gaps of less than 5 minutes do not need to be identified. So the following wouldn't be recognized as a "gap".

2014-07-23 04:28:30 and 2014-07-23 04:29:31    (Only 61 seconds)
2014-07-23 04:40:37 and 2014-07-23 04:40:39    (Only 2 seconds)
2014-07-23 04:40:44.999000 and 2014-07-23 04:41:46 (Just over 61 seconds)

How can I find the gaps mentioned above? When I tried the solution mentioned in this answer, nothing seems to have changed. I used the following command:

df.reindex(pd.date_range(min(df['TIMESTAMP']), max(df['TIMESTAMP']), freq='5min')).fillna(0)

The dataframe looks the same after this command is run.

like image 354
NewGuy Avatar asked Oct 06 '15 15:10

NewGuy


People also ask

How do you find the minutes in a data frame?

Use min() function on a dataframe with 'axis = 1' attribute to find the minimum value over the row axis. 3) Get minimum values of every column without skipping None Value : Use min() function on a dataframe which has Na value with 'skipna = False' attribute to find the minimum value over the column axis.

What is the min () function in Pandas?

Pandas DataFrame min() Method The min() method returns a Series with the minimum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the minimum value for each row.


1 Answers

IIUC so long as the dtype are datetime64 already then you can just use diff which will create a timedelta and then call the attribute dt.seconds:

In [8]:
df['OVER 5 MINS'] = (df['TIMESTAMP'].diff()).dt.seconds > 300
df

Out[8]:
                    TIMESTAMP  EVENT_COUNT OVER 5 MINS
INDEX                                                 
0     2014-07-23 04:28:23.000            1       False
1     2014-07-23 04:28:24.000            1       False
2     2014-07-23 04:28:25.999            4       False
3     2014-07-23 04:28:27.000            1       False
4     2014-07-23 04:28:28.999            2       False
5     2014-07-23 04:28:30.000            1       False
6     2014-07-23 04:29:31.000            7       False
7     2014-07-23 04:29:33.000            1       False
8     2014-07-23 04:29:34.000            1       False
9     2014-07-23 04:29:36.000            1       False
10    2014-07-23 04:40:37.000            2        True
11    2014-07-23 04:40:39.000            1       False
12    2014-07-23 04:40:40.000            1       False
13    2014-07-23 04:40:42.000            1       False
14    2014-07-23 04:40:43.000            1       False
15    2014-07-23 04:40:44.999            4       False
16    2014-07-23 04:41:46.000            1       False
17    2014-07-23 04:41:47.000            1       False
18    2014-07-23 04:41:49.000            1       False
19    2014-07-23 04:41:50.000            1       False
20    2014-07-23 04:50:52.000            9        True
21    2014-07-23 04:50:53.000            4       False
22    2014-07-23 04:50:55.000            6       False
23    2014-07-27 01:12:13.000            1        True
like image 146
EdChum Avatar answered Oct 07 '22 16:10

EdChum