I am trying to work out the time delta between values in a grouped pandas df.
My df looks like this:
Location ID Item Qty Time
0 7 202545942 100130 1 07:19:46
1 8 202545943 100130 1 07:20:08
2 11 202545950 100130 1 07:20:31
3 13 202545955 100130 1 07:21:08
4 15 202545958 100130 1 07:21:18
5 18 202545963 100130 3 07:21:53
6 217 202546320 100130 1 07:22:43
7 219 202546324 100130 1 07:22:54
8 229 202546351 100130 1 07:23:32
9 246 202546376 100130 1 07:24:09
10 273 202546438 100130 1 07:24:37
11 286 202546464 100130 1 07:24:59
12 296 202546490 100130 1 07:25:16
13 297 202546491 100130 1 07:25:24
14 310 202546516 100130 1 07:25:59
15 321 202546538 100130 1 07:26:17
16 329 202546549 100130 1 07:28:09
17 388 202546669 100130 1 07:29:02
18 420 202546717 100130 2 07:30:01
19 451 202546766 100130 1 07:30:19
20 456 202546773 100130 1 07:30:27
(...)
42688 458 202546777 999969 1 06:51:16
42689 509 202546884 999969 1 06:53:09
42690 567 202546977 999969 1 06:54:21
42691 656 202547104 999969 1 06:57:27
I have grouped this using the following method:
ndf = df.groupby(['ID','Location','Time'])
If I add .size()
to the end of the above and print(ndf)
I get the following output:
(...)
ID Location Time
995812 696 07:10:36 1
730 07:11:41 1
761 07:12:30 1
771 07:20:49 1
995820 381 06:55:07 1
761 07:12:44 1
(...)
This is the as desired.
My challenge is that I need to work out the time delta between each time per Item and add this as a column in the dataframe grouping. It should give me the following:
ID Location Time Delta
(...)
995812 696 07:10:36 0
730 07:11:41 00:01:05
761 07:12:30 00:00:49
771 07:20:49 00:08:19
995820 381 06:55:07 0
761 07:12:44 00:17:37
(...)
I am pulling my hair out trying to work out a method of doing this, so I'm turning to the greats.
Please help. Thanks in advance.
Convert Time
column to timedeltas by to_timedelta
, sort by all 3 columns by DataFrame.sort_values
, get difference per groups by DataFrameGroupBy.diff
, replace missing values to 0 timedelta
by Series.fillna
:
#if strings astype should be omit
df['Time'] = pd.to_timedelta(df['Time'].astype(str))
df = df.sort_values(['ID','Location','Time'])
df['Delta'] = df.groupby('ID')['Time'].diff().fillna(pd.Timedelta(0))
Also is possible convert timedeltas to seconds - add Series.dt.total_seconds
:
df['Delta_sec'] = df.groupby('ID')['Time'].diff().dt.total_seconds().fillna(0)
If you just wanted to iterate over the groupby object, based on your original question title you can do it:
for (x, y) in df.groupby(['ID','Location','Time']):
print("{0}, {1}".format(x, y))
# your logic
However, this works for 10.000 rows, 100.000 rows, but not so good for 10^6 rows or more.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With