Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Track an operation in Dataframe

My dataframes

df1:

Index                 Amount               
01.01.2018 08:00:00   23.25
01.01.2018 08:10:00   25.50
01.01.2018 08:20:00   26.30
01.01.2018 08:30:00   25.00
01.01.2018 08:40:00   20.00
01.01.2018 08:50:00   21.20
01.01.2018 09:00:00   21.20
01.01.2018 09:10:00   31.20

df2:

Index       Operation
01.01.2018  -5.00
01.01.2018  10.00

I'd like to track operations from df2 in my df1.

So basically check Operations in df2, and find where this event happened in df1. For ex. there is -5.00, and this event happened here:

01.01.2018 08:30:00   25.00
01.01.2018 08:40:00   20.00

My excpected output:

df:

Index                 Amount  Operation_T/F  Amount_Operation              
01.01.2018 08:00:00   23.25   0              0
01.01.2018 08:10:00   25.50   0              0
01.01.2018 08:20:00   26.30   0              0
01.01.2018 08:30:00   25.00   0              0
01.01.2018 08:40:00   20.00   1              -5.0
01.01.2018 08:50:00   21.20   0              0
01.01.2018 09:00:00   21.20   0              0
01.01.2018 09:10:00   31.20   1              10.0

The fact that operations can be a duplicate during the day is not a problem. Of course playing with some for's and if's could be a solution but I am trying to achieve a clean code in python and I'm thinking a way to do it nicer.

Had some issues with writing that True or False value for operation, if in row or row + 1. My idea to solve that was to create bins for two rows, and then track if operation event occurred in that bin. What do you think?

Thanks in advance :)

like image 415
Luigi Avatar asked May 15 '26 20:05

Luigi


2 Answers

Here is one approach using diff to check where the first diferences in df2.Operation are equal to df2.Operation and leveraging broadcasting:

m = df1.Amount.diff().values == df2.Operation.values[:,None]
df1['Operation_T/F'] = m.sum(0)
df1['Amount_Operation'] = (m * df2.Operation.values[:,None]).sum(0)

         Index         Amount         Operation_T/F  Amount_Operation
0 2018-01-01 08:00:00   23.25              0               0.0
1 2018-01-01 08:10:00   25.50              0               0.0
2 2018-01-01 08:20:00   26.30              0               0.0
3 2018-01-01 08:30:00   25.00              0               0.0
4 2018-01-01 08:40:00   20.00              1              -5.0
5 2018-01-01 08:50:00   21.20              0               0.0
6 2018-01-01 09:00:00   21.20              0               0.0
7 2018-01-01 09:10:00   31.20              1              10.0
like image 155
yatu Avatar answered May 18 '26 09:05

yatu


IIUC, you want to merge on the difference of the Amount and the date:

df1['date'] = df1.index.floor('D')
df1['Amount_Operation'] = df1.Amount.diff()


df = (df1.reset_index()
         .merge(df2.reset_index(), 
              left_on=['date', 'Amount_Operation'],
              right_on=['Index','Operation'],
              left_index=True,
              suffixes=['','_y'],
              how='left')
         .drop(['Index_y', 'date'], axis=1)
     )

df['Operation_T/F'] = df.Operation.isna()
df['Amount_Operation'] = df.Operation.fillna(0)

Output:

                Index  Amount  Amount_Operation  Operation  Operation_T/F
1 2018-01-01 08:00:00   23.25               0.0        NaN           True
1 2018-01-01 08:10:00   25.50               0.0        NaN           True
1 2018-01-01 08:20:00   26.30               0.0        NaN           True
1 2018-01-01 08:30:00   25.00               0.0        NaN           True
0 2018-01-01 08:40:00   20.00              -5.0       -5.0          False
1 2018-01-01 08:50:00   21.20               0.0        NaN           True
1 2018-01-01 09:00:00   21.20               0.0        NaN           True
1 2018-01-01 09:10:00   31.20              10.0       10.0          False
like image 20
Quang Hoang Avatar answered May 18 '26 09:05

Quang Hoang