Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill values in a Dataframe depending on values around it

Tags:

python

pandas

I have a dataframe that looks something like this:

1   2  3  'String'
''  4  X  ''
''  5  X  ''
''  6  7  'String'
''  1  Y  ''

And I want to change the Xs and Ys (put here just to visualize) to the value corresponding to the same column when the last column = 'String'. So, the Xs would become a 3, and the Y would be 7:

1  2  3 'String'
'' 4  3 ''
'' 5  3 ''
'' 6  7 'String'
'' 1  7 ''

The reference value is the same until another 'parent' row comes around. So the first 3 remains until there comes another 'String' parent round.

I tried generating another dataframe containing where there's 'String' and filling from idx to idx+1 with the value, but it's too slow.

This is really similar to a forward fill (pd.ffill()), but not exactly, and I don't really know if it's feasible to turn my problem into a ffill() problem.

like image 320
Lucas P Avatar asked Sep 20 '25 12:09

Lucas P


1 Answers

Updated solution:

This situation can be solved using .ffill() but, you just have to replace the random int values with `NaN` values,

df.loc[df['D'] != 'String', 'C'] = np.nan

What this does is it finds where df['D'] is not 'String' and assigns a NaN value to it.

Now, the last step is simple, just use .ffill()

df['C'] = df['C'].ffill()

Here is the final result:

>>> df
   C    D
0  3.0  String
1  3.0        
2  3.0        
3  7.0  String
4  7.0        
like image 130
Aadvik Avatar answered Sep 23 '25 04:09

Aadvik