Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do forward filling for each group in pandas

I have a dataframe similar to below

id A   B   C   D E
1  2   3   4   5 5
1  NaN 4   NaN 6 7
2  3   4   5   6 6
2  NaN NaN 5   4 1

I want to do a null value imputation for columns A, B, C in a forward filling but for each group. That means, I want the forward filling be applied on each id. How can I do that?

like image 336
HHH Avatar asked Dec 09 '18 21:12

HHH


People also ask

How do I forward fill in pandas?

Pandas DataFrame ffill() MethodThe ffill() method replaces the NULL values with the value from the previous row (or previous column, if the axis parameter is set to 'columns' ).

What is forward and backward fill pandas?

bfill() is used to backward fill the missing values in the dataset. It will backward fill the NaN values that are present in the pandas dataframe. ffill() function is used forward fill the missing value in the dataframe.

Does pandas Groupby maintain order?

Groupby preserves the order of rows within each group.

How do you a group by to a list in pandas Dataframe?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


Video Answer


1 Answers

Use GroupBy.ffill for forward filling per groups for all columns, but if first values per groups are NaNs there is no replace, so is possible use fillna and last casting to integers:

print (df)
   id    A    B    C  D    E
0   1  2.0  3.0  4.0  5  NaN
1   1  NaN  4.0  NaN  6  NaN
2   2  3.0  4.0  5.0  6  6.0
3   2  NaN  NaN  5.0  4  1.0

cols = ['A','B','C']
df[['id'] + cols] = df.groupby('id')[cols].ffill().fillna(0).astype(int)
print (df)
   id  A  B  C  D    E
0   1  2  3  4  5  NaN
1   1  2  4  4  6  NaN
2   2  3  4  5  6  6.0
3   2  3  4  5  4  1.0

Detail:

print (df.groupby('id')[cols].ffill().fillna(0).astype(int))
   id  A  B  C
0   1  2  3  4
1   1  2  4  4
2   2  3  4  5
3   2  3  4  5

Or:

cols = ['A','B','C']
df.update(df.groupby('id')[cols].ffill().fillna(0))
print (df)
   id    A    B    C  D    E
0   1  2.0  3.0  4.0  5  NaN
1   1  2.0  4.0  4.0  6  NaN
2   2  3.0  4.0  5.0  6  6.0
3   2  3.0  4.0  5.0  4  1.0
like image 187
jezrael Avatar answered Sep 19 '22 05:09

jezrael