Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order Pandas DataFrame by groups and Timestamp

Tags:

python

pandas

I have the below sample DataFrame

             Timestamp Item Char  Value
4  1/7/2020 1:22:22 AM    B  C.B    3.2
0  1/7/2020 1:23:23 AM    A  C.A    1.0
2  1/7/2020 1:23:23 AM    A  C.B    1.3
1  1/7/2020 1:23:24 AM    A  C.A    2.0
5  1/7/2020 1:23:29 AM    B  C.B    3.0
3  1/7/2020 1:25:23 AM    B  C.B    2.0

I would like to add a new column that tells the order an Item appears in the same Char, based on the Timestamp. In particular, I would like to assign 1 to the last value, 2 to the second-last value and so on.

The result should look like as follows

             Timestamp Item Char  Value   Order
0  1/7/2020 1:23:23 AM    A  C.A    1.0   2
1  1/7/2020 1:23:24 AM    A  C.A    2.0   1
2  1/7/2020 1:23:23 AM    A  C.B    1.3   1 
3  1/7/2020 1:22:22 AM    B  C.B    3.2   3
4  1/7/2020 1:23:29 AM    B  C.B    3.0   2
5  1/7/2020 1:25:23 AM    B  C.B    2.0   1

As you see the B item appears several times in the Char C.B. I would assign 1 to the most recent value based on the Timestamp.

My idea is to group the DataFrame by Item and by Char, then order the rows of each group by the Timestamp in descending order, finally assign 1 to the first row, 2 to the second and so on. But I don´t actually know how to do this.

Can you help me out?

Thank you very much!

like image 296
Daniel Zito Avatar asked Oct 04 '20 13:10

Daniel Zito


2 Answers

Let's groupby the column Timestamp on Char and Item and compute the rank using method=first, then use sort_values to sort the dataframe based on Char and Item:

df['Order'] = pd.to_datetime(df['Timestamp'])\
              .groupby([df['Char'], df['Item']])\
              .rank(method='first', ascending=False)

df = df.sort_values(['Char', 'Item'], ignore_index=True)

             Timestamp Item Char  Value  Order
0  1/7/2020 1:23:23 AM    A  C.A    1.0    2.0
1  1/7/2020 1:23:24 AM    A  C.A    2.0    1.0
2  1/7/2020 1:23:23 AM    A  C.B    1.3    1.0
3  1/7/2020 1:22:22 AM    B  C.B    3.2    3.0
4  1/7/2020 1:23:29 AM    B  C.B    3.0    2.0
5  1/7/2020 1:25:23 AM    B  C.B    2.0    1.0
like image 123
Shubham Sharma Avatar answered Oct 07 '22 12:10

Shubham Sharma


Sort and Transform

 df = df.sort_values(['Timestamp'],ascending=False)
 df['Order'] = df.groupby(['Item', 'Char'])['Value'].transform(
     lambda x: np.arange(1, len(x)+1))

Sammple:

import pandas as pd
from io import StringIO 
data = StringIO("""
,Timestamp,Item,Char,Value
0,1/7/2020 1:22:22 AM,B,C.B,3.2
1,1/7/2020 1:23:23 AM,A,C.A,1.0
2,1/7/2020 1:23:23 AM,A,C.B,1.3
3,1/7/2020 1:23:24 AM,A,C.A,2.0
4,1/7/2020 1:23:29 AM,B,C.B,3.0
5,1/7/2020 1:25:23 AM,B,C.B,2.0
""" )
df = pd.read_csv(data, index_col=0)
df['Timestamp'] = pd.to_datetime(df['Timestamp'])


df = df.sort_values(['Timestamp'],ascending=False)
df['Order'] = df.groupby(['Item', 'Char'])['Value'].transform(
     lambda x: np.arange(1, len(x)+1))
 
print (df.sort_values(['Item', 'Timestamp']))

output:

            Timestamp Item Char  Value  Order
1 2020-01-07 01:23:23    A  C.A    1.0    2.0
2 2020-01-07 01:23:23    A  C.B    1.3    1.0
3 2020-01-07 01:23:24    A  C.A    2.0    1.0
0 2020-01-07 01:22:22    B  C.B    3.2    3.0
4 2020-01-07 01:23:29    B  C.B    3.0    2.0
5 2020-01-07 01:25:23    B  C.B    2.0    1.0
like image 28
mujjiga Avatar answered Oct 07 '22 14:10

mujjiga