My dataframes
df1:
Index Amount 01.01.2018 08:00:00 23.25 01.01.2018 08:10:00 25.50 01.01.2018 08:20:00 26.30 01.01.2018 08:30:00 25.00 01.01.2018 08:40:00 20.00 01.01.2018 08:50:00 21.20 01.01.2018 09:00:00 21.20 01.01.2018 09:10:00 31.20
df2:
Index Operation 01.01.2018 -5.00 01.01.2018 10.00
I'd like to track operations from df2 in my df1.
So basically check Operations in df2, and find where this event happened in df1. For ex. there is -5.00, and this event happened here:
01.01.2018 08:30:00 25.00 01.01.2018 08:40:00 20.00
My excpected output:
df:
Index Amount Operation_T/F Amount_Operation 01.01.2018 08:00:00 23.25 0 0 01.01.2018 08:10:00 25.50 0 0 01.01.2018 08:20:00 26.30 0 0 01.01.2018 08:30:00 25.00 0 0 01.01.2018 08:40:00 20.00 1 -5.0 01.01.2018 08:50:00 21.20 0 0 01.01.2018 09:00:00 21.20 0 0 01.01.2018 09:10:00 31.20 1 10.0
The fact that operations can be a duplicate during the day is not a problem. Of course playing with some for's and if's could be a solution but I am trying to achieve a clean code in python and I'm thinking a way to do it nicer.
Had some issues with writing that True or False value for operation, if in row or row + 1.
My idea to solve that was to create bins for two rows, and then track if operation event occurred in that bin. What do you think?
Thanks in advance :)
Here is one approach using diff to check where the first diferences in df2.Operation are equal to df2.Operation and leveraging broadcasting:
m = df1.Amount.diff().values == df2.Operation.values[:,None]
df1['Operation_T/F'] = m.sum(0)
df1['Amount_Operation'] = (m * df2.Operation.values[:,None]).sum(0)
Index Amount Operation_T/F Amount_Operation
0 2018-01-01 08:00:00 23.25 0 0.0
1 2018-01-01 08:10:00 25.50 0 0.0
2 2018-01-01 08:20:00 26.30 0 0.0
3 2018-01-01 08:30:00 25.00 0 0.0
4 2018-01-01 08:40:00 20.00 1 -5.0
5 2018-01-01 08:50:00 21.20 0 0.0
6 2018-01-01 09:00:00 21.20 0 0.0
7 2018-01-01 09:10:00 31.20 1 10.0
IIUC, you want to merge on the difference of the Amount and the date:
df1['date'] = df1.index.floor('D')
df1['Amount_Operation'] = df1.Amount.diff()
df = (df1.reset_index()
.merge(df2.reset_index(),
left_on=['date', 'Amount_Operation'],
right_on=['Index','Operation'],
left_index=True,
suffixes=['','_y'],
how='left')
.drop(['Index_y', 'date'], axis=1)
)
df['Operation_T/F'] = df.Operation.isna()
df['Amount_Operation'] = df.Operation.fillna(0)
Output:
Index Amount Amount_Operation Operation Operation_T/F
1 2018-01-01 08:00:00 23.25 0.0 NaN True
1 2018-01-01 08:10:00 25.50 0.0 NaN True
1 2018-01-01 08:20:00 26.30 0.0 NaN True
1 2018-01-01 08:30:00 25.00 0.0 NaN True
0 2018-01-01 08:40:00 20.00 -5.0 -5.0 False
1 2018-01-01 08:50:00 21.20 0.0 NaN True
1 2018-01-01 09:00:00 21.20 0.0 NaN True
1 2018-01-01 09:10:00 31.20 10.0 10.0 False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With