Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing empty values in a DataFrame with value of a column

Tags:

python

pandas

Say I have the following pandas dataframe:

df = pd.DataFrame([[3, 2, np.nan, 0],
                    [5, 4, 2, np.nan],
                    [7, np.nan, np.nan, 5],
                    [9, 3, np.nan, 4]],
                    columns=list('ABCD'))

which returns this:

   A    B    C    D
0  3  2.0  NaN  0.0
1  5  4.0  2.0  NaN
2  7  NaN  NaN  5.0
3  9  3.0  NaN  4.0

I'd like that if a np.nan is found, that the value is replaced by a value in the A column. So that would mean the result to be this:

   A    B    C    D
0  3  2.0  3.0  0.0
1  5  4.0  2.0  5.0
2  7  7.0  7.0  5.0
3  9  3.0  9.0  4.0

I've tried multiple things, but I could not get anything to work. Can anyone help?

like image 946
user498537 Avatar asked Nov 02 '18 14:11

user498537


People also ask

How do you replace a blank value in Python?

Replace NaN with Empty String using replace() We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.

How do you fill a blank cell with value in Python?

In this method, we will use “df. fillna(method='ffill')” , which is used to propagate non-null values forward or backward.

How do I replace missing values in a column in Pandas?

The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).

How do I change the values in a column in Pandas?

Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])


3 Answers

Here is necessary double transpose:

cols = ['B','C', 'D']
df[cols] = df[cols].T.fillna(df['A']).T
print(df)
   A    B    C    D
0  3  2.0  3.0  0.0
1  5  4.0  2.0  5.0
2  7  7.0  7.0  5.0
3  9  3.0  9.0  4.0

because:

df[cols] = df[cols].fillna(df['A'], axis=1)
print(df)

NotImplementedError: Currently only can fill with dict/Series column by column

Another solution with numpy.where and broadcasting column A:

df = pd.DataFrame(np.where(df.isnull(), df['A'].values[:, None], df), 
                  index=df.index, 
                  columns=df.columns)
print (df)
     A    B    C    D
0  3.0  2.0  3.0  0.0
1  5.0  4.0  2.0  5.0
2  7.0  7.0  7.0  5.0
3  9.0  3.0  9.0  4.0

Thank you @pir for another solution:

df = pd.DataFrame(np.where(df.isnull(), df[['A']], df), 
                  index=df.index, 
                  columns=df.columns)
like image 94
jezrael Avatar answered Oct 01 '22 19:10

jezrael


Currently, fillna doesn't allow for broadcasting a series across columns while aligning the indices.

pandas.DataFrame.mask

This functions exactly like what we'd want fillna to do. Finds the the nulls, fills it in with df.A along axis=0

df.mask(df.isna(), df.A, axis=0)

   A    B    C    D
0  3  2.0  3.0  0.0
1  5  4.0  2.0  5.0
2  7  7.0  7.0  5.0
3  9  3.0  9.0  4.0

pandas.DataFrame.fillna using a dictionary

However, you can pass a dictionary to fillna that tells it what to do for each column.

df.fillna({k: df.A for k in df})

   A    B    C    D
0  3  2.0  3.0  0.0
1  5  4.0  2.0  5.0
2  7  7.0  7.0  5.0
3  9  3.0  9.0  4.0
like image 27
piRSquared Avatar answered Oct 01 '22 19:10

piRSquared


DO fillna with reindex

df.fillna(df[['A']].reindex(columns=df.columns).ffill(1))
Out[20]: 
   A    B    C    D
0  3  2.0  3.0  0.0
1  5  4.0  2.0  5.0
2  7  7.0  7.0  5.0
3  9  3.0  9.0  4.0

Or combine_first

df.combine_first(df.fillna(0).add(df.A,0))
Out[35]: 
   A    B    C    D
0  3  2.0  3.0  0.0
1  5  4.0  2.0  5.0
2  7  7.0  7.0  5.0
3  9  3.0  9.0  4.0
like image 39
BENY Avatar answered Oct 01 '22 17:10

BENY