I have a DateFrame with a mixture of string, and float rows. The float rows are all still whole numbers and were only changed to floats because their were missing values. I want to fill in all the NaN rows that are numbers with zero while leaving the NaN in columns that are strings. Here is what I have currently.
df.select_dtypes(include=['int', 'float']).fillna(0, inplace=True)
This doesn't work and I think it is because .select_dtypes() returns a view of the DataFrame so the .fillna() doesn't work. Is there a method similar to this to fill all the NaNs on only the float rows.
Pandas replace nan with 0 inplace In this method, the inplace parameter is set to inplace =True which means that it will fill in the null values and directly modify the original Pandas DataFrame. If you set inplace =True then it fills values at an empty place.
Use either DF.combine_first
(does not act inplace
):
df.combine_first(df.select_dtypes(include=[np.number]).fillna(0))
or DF.update
(modifies inplace
):
df.update(df.select_dtypes(include=[np.number]).fillna(0))
The reason why fillna
fails is because DF.select_dtypes
returns a completely new dataframe which although forms a subset of the original DF
, but is not really a part of it. It behaves as a completely new entity in itself. So any modifications done to it will not affect the DF
it gets derived from.
Note that np.number
selects all numeric type.
Your pandas.DataFrame.select_dtypes
approach is good; you've just got to cross the finish line:
>>> df = pd.DataFrame({'A': [np.nan, 'string', 'string', 'more string'], 'B': [np.nan, np.nan, 3, 4], 'C': [4, np.nan, 5, 6]})
>>> df
A B C
0 NaN NaN 4.0
1 string NaN NaN
2 string 3.0 5.0
3 more string 4.0 6.0
Don't try to perform the in-place fillna
here (there's a time and place for inplace=True
, but here is not one). You're right in that what's returned by select_dtypes
is basically a view. Create a new dataframe called filled
and join the filled (or "fixed") columns back with your original data:
>>> filled = df.select_dtypes(include=['int', 'float']).fillna(0)
>>> filled
B C
0 0.0 4.0
1 0.0 0.0
2 3.0 5.0
3 4.0 6.0
>>> df = df.join(filled, rsuffix='_filled')
>>> df
A B C B_filled C_filled
0 NaN NaN 4.0 0.0 4.0
1 string NaN NaN 0.0 0.0
2 string 3.0 5.0 3.0 5.0
3 more string 4.0 6.0 4.0 6.0
Then you can drop whatever original columns you had to keep only the "filled" ones:
>>> df.drop([x[:x.find('_filled')] for x in df.columns if '_filled' in x], axis=1, inplace=True)
>>> df
A B_filled C_filled
0 NaN 0.0 4.0
1 string 0.0 0.0
2 string 3.0 5.0
3 more string 4.0 6.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With