Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill NaN based on previous value of row

Tags:

python

pandas

I have a data frame (sample, not real):

df =

    A   B   C    D   E     F       

0   3   4   NaN  NaN NaN   NaN  
1   9   8   NaN  NaN NaN   NaN      
2   5   9   4    7   NaN   NaN  
3   5   7   6    3   NaN   NaN  
4   2   6   4    3   NaN   NaN  

Now I want to fill NaN values with previous couple(!!!) values of row (fill Nan with left existing couple of numbers and apply to the whole row) and apply this to the whole dataset.

  • There are a lot of answers concerning filling the columns. But in this case I need to fill based on rows.
  • There are also answers related to fill NaN based on other column, but in my case number of columns are more than 2000. This is sample data

Desired output is:

df =

   A  B   C  D  E  F  

0  3  4   3  4  3  4  
1  9  8   9  8  9  8  
2  5  9   4  7  4  7      
3  5  7   6  3  6  3  
4  2  6   4  3  4  3  
like image 342
Mamed Avatar asked Aug 27 '19 13:08

Mamed


People also ask

How do you fill a previous value in Python?

If you want to use the previous value in a column or a row to fill the current missing value in a pandas DataFrame, use df. fillna(method='ffill').

How do I fill NaN based on another column?

Using fillna() to fill values from another column Here, we apply the fillna() function on “Col1” of the dataframe df and pass the series df['Col2'] as an argument. The above code fills the missing values in “Col1” with the corresponding values (based on the index) from “Col2”.


2 Answers

IIUC, a quick solution without reshaping the data:

df.iloc[:,::2] = df.iloc[:,::2].ffill(1)
df.iloc[:,1::2] = df.iloc[:,1::2].ffill(1)
df

Output:

   A  B  C  D  E  F
0  3  4  3  4  3  4
1  9  8  9  8  9  8
2  5  9  4  7  4  7
3  5  7  6  3  6  3
4  2  6  4  3  4  3
like image 142
Quang Hoang Avatar answered Oct 19 '22 00:10

Quang Hoang


Idea is reshape DataFrame for possible forward and back filling missing values with stack and modulo and integer division of 2 of array by length of columns:

c = df.columns 
a = np.arange(len(df.columns))
df.columns = [a // 2, a % 2]

#if possible some pairs missing remove .astype(int)
df1 = df.stack().ffill(axis=1).bfill(axis=1).unstack().astype(int)
df1.columns = c
print (df1)
   A  B  C  D  E  F
0  3  4  3  4  3  4
1  9  8  9  8  9  8
2  5  9  4  7  4  7
3  5  7  6  3  6  3
4  2  6  4  3  4  3

Detail:

print (df.stack())
     0    1   2
0 0  3  NaN NaN
  1  4  NaN NaN
1 0  9  NaN NaN
  1  8  NaN NaN
2 0  5  4.0 NaN
  1  9  7.0 NaN
3 0  5  6.0 NaN
  1  7  3.0 NaN
4 0  2  4.0 NaN
  1  6  3.0 NaN
like image 35
jezrael Avatar answered Oct 19 '22 02:10

jezrael