I have the following data frame which I want to apply bfill as follows:
| 'amount' | 'percentage' |
|---|---|
| Nan | |
| 1.0 | 20 |
| 2.0 | 10 |
| Nan | |
| Nan | |
| Nan | |
| Nan | |
| 3.0 | 50 |
| 4.0 | 10 |
| Nan | |
| 5.0 | 10 |
I want to bfill Nan in the amount column as per percentage in the percentage column i.e., if the corresponding percentage is 50 then fill 50% of Nan before the number (partial fill). e.g. amount with 3.0 value have a percentage of 50 so out of 4 Nan entries, only 50% are to be bfill.
proposed output:
| 'amount' | 'percentage' |
|---|---|
| Nan | |
| 1.0 | 20 |
| 2.0 | 10 |
| Nan | |
| Nan | |
| 3.0 | |
| 3.0 | |
| 3.0 | 50 |
| 4.0 | 10 |
| Nan | |
| 5.0 | 10 |
Please help.
Create groups according to NaNs
df['group_id'] = df.amount.where(df.amount.isna(), 1).cumsum().bfill()
Create a filling function
def custom_fill(x):
# Calculate number of rows to be filled
max_fill_rows = math.floor(x.iloc[-1, 1] * (x.shape[0] - 1) / 100)
# Fill only if number of rows to fill is not zero
return x.bfill(limit=max_fill_rows) if max_fill_rows else x
Fill the DataFrame
df.groupby('group_id').apply(custom_fill)
Output
amount percentage group_id
0 NaN NaN 1.0
1 1.0 20.0 1.0
2 2.0 10.0 2.0
3 NaN NaN 3.0
4 NaN NaN 3.0
5 3.0 50.0 3.0
6 3.0 50.0 3.0
7 3.0 50.0 3.0
8 4.0 10.0 4.0
9 NaN NaN 5.0
10 5.0 10.0 5.0
PS: Don't forget to import the required libraries
import math
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With