I have the below dataframe df:
| Staff_ID | Join_Date | Time_Stamp |
|----------|-----------|------------|
| 1 | 3/29/2016 | 4/23/2016 |
| 1 | 3/29/2016 | 3/29/2016 |
| 1 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 |
| 2 | 5/15/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 |
| 3 | 7/24/2016 | 4/21/2016 |
I want to get the Min and Max "Time_Stamp Date" partition by "Staff_ID" such that the resultant dataframe is as follows:
| Staff_ID | Join_Date | Time_Stamp | Min_Time_Stamp | Max_Time_Stamp |
|----------|-----------|------------|----------------|----------------|
| 1 | 3/29/2016 | 4/23/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 3/29/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 6/21/2016 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 | 4/1/2016 | 5/25/2016 |
| 2 | 5/15/2016 | 5/25/2016 | 4/1/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 4/21/2016 | 4/21/2016 | 6/21/2016 |
How can I do this in Python ?
Let's use groupby
with transform
and assign
:
g = df.groupby('Staff_ID')['Time_Stamp']
df.assign(Min_Time_Stamp = g.transform(min), Max_Time_Stamp = g.transform(max))
Output:
Staff_ID Join_Date Time_Stamp Max_Time_Stamp Min_Time_Stamp
1 1 3/29/2016 4/23/2016 6/21/2016 3/29/2016
2 1 3/29/2016 3/29/2016 6/21/2016 3/29/2016
3 1 3/29/2016 6/21/2016 6/21/2016 3/29/2016
4 2 5/15/2016 4/1/2016 5/25/2016 4/1/2016
5 2 5/15/2016 5/25/2016 5/25/2016 4/1/2016
6 3 7/24/2016 6/21/2016 6/21/2016 4/21/2016
7 3 7/24/2016 6/10/2016 6/21/2016 4/21/2016
8 3 7/24/2016 4/21/2016 6/21/2016 4/21/2016
@CarlesMitjans method:
10 loops, best of 3: 33.3 ms per loop
@ScottBoston method:
100 loops, best of 3: 5.52 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With