Say I have the following pandas dataframe:
df = pd.DataFrame([[3, 2, np.nan, 0],
[5, 4, 2, np.nan],
[7, np.nan, np.nan, 5],
[9, 3, np.nan, 4]],
columns=list('ABCD'))
which returns this:
A B C D
0 3 2.0 NaN 0.0
1 5 4.0 2.0 NaN
2 7 NaN NaN 5.0
3 9 3.0 NaN 4.0
I'd like that if a np.nan is found, that the value is replaced by a value in the A column. So that would mean the result to be this:
A B C D
0 3 2.0 3.0 0.0
1 5 4.0 2.0 5.0
2 7 7.0 7.0 5.0
3 9 3.0 9.0 4.0
I've tried multiple things, but I could not get anything to work. Can anyone help?
Replace NaN with Empty String using replace() We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.
In this method, we will use “df. fillna(method='ffill')” , which is used to propagate non-null values forward or backward.
The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).
Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])
Here is necessary double transpose:
cols = ['B','C', 'D']
df[cols] = df[cols].T.fillna(df['A']).T
print(df)
A B C D
0 3 2.0 3.0 0.0
1 5 4.0 2.0 5.0
2 7 7.0 7.0 5.0
3 9 3.0 9.0 4.0
because:
df[cols] = df[cols].fillna(df['A'], axis=1)
print(df)
NotImplementedError: Currently only can fill with dict/Series column by column
Another solution with numpy.where
and broadcasting column A
:
df = pd.DataFrame(np.where(df.isnull(), df['A'].values[:, None], df),
index=df.index,
columns=df.columns)
print (df)
A B C D
0 3.0 2.0 3.0 0.0
1 5.0 4.0 2.0 5.0
2 7.0 7.0 7.0 5.0
3 9.0 3.0 9.0 4.0
Thank you @pir for another solution:
df = pd.DataFrame(np.where(df.isnull(), df[['A']], df),
index=df.index,
columns=df.columns)
Currently, fillna
doesn't allow for broadcasting a series across columns while aligning the indices.
pandas.DataFrame.mask
This functions exactly like what we'd want fillna
to do. Finds the the nulls, fills it in with df.A
along axis=0
df.mask(df.isna(), df.A, axis=0)
A B C D
0 3 2.0 3.0 0.0
1 5 4.0 2.0 5.0
2 7 7.0 7.0 5.0
3 9 3.0 9.0 4.0
pandas.DataFrame.fillna
using a dictionaryHowever, you can pass a dictionary to fillna
that tells it what to do for each column.
df.fillna({k: df.A for k in df})
A B C D
0 3 2.0 3.0 0.0
1 5 4.0 2.0 5.0
2 7 7.0 7.0 5.0
3 9 3.0 9.0 4.0
DO fillna
with reindex
df.fillna(df[['A']].reindex(columns=df.columns).ffill(1))
Out[20]:
A B C D
0 3 2.0 3.0 0.0
1 5 4.0 2.0 5.0
2 7 7.0 7.0 5.0
3 9 3.0 9.0 4.0
Or combine_first
df.combine_first(df.fillna(0).add(df.A,0))
Out[35]:
A B C D
0 3 2.0 3.0 0.0
1 5 4.0 2.0 5.0
2 7 7.0 7.0 5.0
3 9 3.0 9.0 4.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With