I have a dataframe df_in like so:
import pandas as pd
dic_in = {'A':['aa','aa','bb','cc','cc','cc','cc','dd','dd','dd','ee'],
'B':['200','200','200','400','400','500','700','700','900','900','200'],
'C':['da','cs','fr','fs','se','at','yu','j5','31','ds','sz']}
df_in = pd.DataFrame(dic_in)
I would like to investigate the 2 columns A and B in the following way.
I 2 consecutive rows[['A','B']]
are equal then they are assigned a new value (according to a specific rule which i am about to describe).
I will give an example to be more clear: If the first row[['A','B']]
is equal to the following one, then I set 1
; if the second one is equal to the third one then I will set 1
. Every time two consecutive rows are different, then I increase the value to set by 1
.
The result should look like this:
A B C value
0 aa 200 da 1
1 aa 200 cs 1
2 bb 200 fr 2
3 cc 400 fs 3
4 cc 400 se 3
5 cc 500 at 4
6 cc 700 yu 5
7 dd 700 j5 6
8 dd 900 31 7
9 dd 900 ds 7
10 ee 200 sz 8
Can you suggest me a smart one to achieve this goal?
Use shift
and any
to compare consecutive rows, using True
to indicate where the value should change. Then take the cumulative sum with cumsum
to get the increasing value:
df_in['value'] = (df_in[['A', 'B']] != df_in[['A', 'B']].shift()).any(axis=1)
df_in['value'] = df_in['value'].cumsum()
Alternatively, condensing it to one line:
df_in['value'] = (df_in[['A', 'B']] != df_in[['A', 'B']].shift()).any(axis=1).cumsum()
The resulting output:
A B C value
0 aa 200 da 1
1 aa 200 cs 1
2 bb 200 fr 2
3 cc 400 fs 3
4 cc 400 se 3
5 cc 500 at 4
6 cc 700 yu 5
7 dd 700 j5 6
8 dd 900 31 7
9 dd 900 ds 7
10 ee 200 sz 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With