I have a dataframe with a column that randomly starts a "count" back at 1. My goal is to produce a new_col that divides my current column by the the last value in a count. See below for an example.
This is my current DataFrame:
col
0 1.0
1 2.0
2 3.0
3 1.0
4 2.0
5 1.0
6 2.0
7 3.0
8 4.0
9 5.0
10 1.0
11 2.0
12 3.0
Trying to get an output like so:
col new_col
0 1.0 0.333
1 2.0 0.667
2 3.0 1.000
3 1.0 0.500
4 2.0 1.000
5 1.0 0.200
6 2.0 0.400
7 3.0 0.600
8 4.0 0.800
9 5.0 1.000
10 1.0 0.333
11 2.0 0.667
12 3.0 1.000
This is what I have tried so far:
df['col_bool'] = pd.DataFrame(df['col'] == 1.0)
idx_lst = [x - 2 for x in df.index[df['col_bool']].tolist()]
idx_lst = idx_lst[1:]
mask = (df['col'] != 1.0)
df_valid = df[mask]
for i in idx_lst:
df['new_col'] = 1.0 / df_valid.iloc[i]['col']
df.loc[mask, 'new_col'] = df_valid['col'] / df_valid.iloc[i]['col']
This understandably results in an index error. Maybe I need to make a copy of a DataFrame each time and concat. I believe this would work but I want to ask if I am missing any shortcuts here?
Try:
df['new_col'] = df['col'].div(df.groupby((df['col'] == 1).cumsum()).transform('last'))
Output:
col new_col
0 1.0 0.333333
1 2.0 0.666667
2 3.0 1.000000
3 1.0 0.500000
4 2.0 1.000000
5 1.0 0.200000
6 2.0 0.400000
7 3.0 0.600000
8 4.0 0.800000
9 5.0 1.000000
10 1.0 0.333333
11 2.0 0.666667
12 3.0 1.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With