Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare current row value to previous row values

I have login history data from User A for a day. My requirement is that at any point in time the User A can have only one valid login. As in the samples below, the user may have attempted to login successfully multiple times, while his first session was still active. So, any logins that happened during the valid session needs to be flagged as duplicate.

Example 1:

In the first sample data below, while the user was still logged in from 00:12:38 to 01:00:02 (index 0), there is another login from the user at 00:55:14 to 01:00:02 (index 1).

Similarly, if we compare index 2 and 3, we can see that the record at index 3 is duplicate login as per requirement.

  start_time  end_time
0   00:12:38  01:00:02
1   00:55:14  01:00:02
2   01:00:02  01:32:40
3   01:00:02  01:08:40
4   01:41:22  03:56:23
5   18:58:26  19:16:49
6   20:12:37  20:52:49
7   20:55:16  22:02:50
8   22:21:24  22:48:50
9   23:11:30  00:00:00

Expected output:

  start_time  end_time   isDup
0   00:12:38  01:00:02       0
1   00:55:14  01:00:02       1
2   01:00:02  01:32:40       0
3   01:00:02  01:08:40       1
4   01:41:22  03:56:23       0
5   18:58:26  19:16:49       0
6   20:12:37  20:52:49       0
7   20:55:16  22:02:50       0
8   22:21:24  22:48:50       0
9   23:11:30  00:00:00       0

These duplicate records need to be updated to 1 at column isDup.


Example 2:

Another sample of data as below. Here, while the user was still logged in between 13:36:10 and 13:50:16, there were 3 additional sessions too that needs to be flagged.

  start_time  end_time
0   13:32:54  13:32:55
1   13:36:10  13:50:16
2   13:37:54  13:38:14
3   13:46:38  13:46:45
4   13:48:59  13:49:05
5   13:50:16  13:50:20
6   14:03:39  14:03:49
7   15:36:20  15:36:20
8   15:46:47  15:46:47

Expected output:

  start_time    end_time    isDup
0   13:32:54    13:32:55    0
1   13:36:10    13:50:16    0
2   13:37:54    13:38:14    1
3   13:46:38    13:46:45    1
4   13:48:59    13:49:05    1
5   13:50:16    13:50:20    0
6   14:03:39    14:03:49    0
7   15:36:20    15:36:20    0
8   15:46:47    15:46:47    0

What's the efficient way to compare the start time of the current record with previous records?

like image 305
Saranya Krishnamurthy Avatar asked Sep 11 '20 04:09

Saranya Krishnamurthy


People also ask

How can I compare one row with previous row in SQL?

Overview of SQL Server LAG() function In other words, by using the LAG() function, from the current row, you can access data of the previous row, or the row before the previous row, and so on. The LAG() function can be very useful for comparing the value of the current row with the value of the previous row.

What is lag function in SQL?

LAG provides access to a row at a given physical offset that comes before the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a previous row.


2 Answers

Query duplicated() and change astype to int

df['isDup']=(df['Start time'].duplicated(False)|df['End time'].duplicated(False)).astype(int)

Or did you need

df['isDup']=(df['Start time'].between(df['Start time'].shift(),df['End time'].shift())).astype(int)
like image 186
wwnde Avatar answered Nov 15 '22 04:11

wwnde


Map the time like values in columns start_time and end_time to pandas TimeDelta objects and subtract 1 seconds from the 00:00:00 timedelta values in end_time column.

c = ['start_time', 'end_time']
s, e = df[c].astype(str).apply(pd.to_timedelta).to_numpy().T
e[e == pd.Timedelta(0)] += pd.Timedelta(days=1, seconds=-1)

Then for each pair of start_time and end_time in the dataframe df mark the corresponding duplicate intervals using numpy broadcasting:

m = (s[:, None] >= s) & (e[:, None] <= e)
np.fill_diagonal(m, False)
df['isDupe'] = (m.any(1) & ~df[c].duplicated(keep=False)).view('i1')

# example 1
  start_time  end_time  isDupe
0   00:12:38  01:00:02       0
1   00:55:14  01:00:02       1
2   01:00:02  01:32:40       0
3   01:00:02  01:08:40       1
4   01:41:22  03:56:23       0
5   18:58:26  19:16:49       0
6   20:12:37  20:52:49       0
7   20:55:16  22:02:50       0
8   22:21:24  22:48:50       0
9   23:11:30  00:00:00       0

# example 2
  start_time  end_time  isDupe
0   13:32:54  13:32:55       0
1   13:36:10  13:50:16       0
2   13:37:54  13:38:14       1
3   13:46:38  13:46:45       1
4   13:48:59  13:49:05       1
5   13:50:16  13:50:20       0
6   14:03:39  14:03:49       0
7   15:36:20  15:36:20       0
8   15:46:47  15:46:47       0
like image 42
Shubham Sharma Avatar answered Nov 15 '22 04:11

Shubham Sharma