Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill in missing pandas data with previous non-missing value, grouped by key

I am dealing with pandas DataFrames like this:

   id    x 0   1   10 1   1   20 2   2  100 3   2  200 4   1  NaN 5   2  NaN 6   1  300 7   1  NaN 

I would like to replace each NAN 'x' with the previous non-NAN 'x' from a row with the same 'id' value:

   id    x 0   1   10 1   1   20 2   2  100 3   2  200 4   1   20 5   2  200 6   1  300 7   1  300 

Is there some slick way to do this without manually looping over rows?

like image 824
ChrisB Avatar asked May 02 '13 18:05

ChrisB


People also ask

Which function is used to fill values with missing values in pandas?

Output: Code #6: Using interpolate() function to fill the missing values using linear method.


1 Answers

You could perform a groupby/forward-fill operation on each group:

import numpy as np import pandas as pd  df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]}) df['x'] = df.groupby(['id'])['x'].ffill() print(df) 

yields

   id      x 0   1   10.0 1   1   20.0 2   2  100.0 3   2  200.0 4   1   20.0 5   2  200.0 6   1  300.0 7   1  300.0 
like image 189
unutbu Avatar answered Nov 07 '22 15:11

unutbu