Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I am trying to fill all NaN values in rows with number data types to zero in pandas

I have a DateFrame with a mixture of string, and float rows. The float rows are all still whole numbers and were only changed to floats because their were missing values. I want to fill in all the NaN rows that are numbers with zero while leaving the NaN in columns that are strings. Here is what I have currently.

df.select_dtypes(include=['int', 'float']).fillna(0, inplace=True)

This doesn't work and I think it is because .select_dtypes() returns a view of the DataFrame so the .fillna() doesn't work. Is there a method similar to this to fill all the NaNs on only the float rows.

like image 586
Don Quixote Avatar asked Mar 24 '17 15:03

Don Quixote


People also ask

Which function will fill 0 in place of NaN?

Pandas replace nan with 0 inplace In this method, the inplace parameter is set to inplace =True which means that it will fill in the null values and directly modify the original Pandas DataFrame. If you set inplace =True then it fills values at an empty place.


2 Answers

Use either DF.combine_first (does not act inplace):

df.combine_first(df.select_dtypes(include=[np.number]).fillna(0))

or DF.update (modifies inplace):

df.update(df.select_dtypes(include=[np.number]).fillna(0))

The reason why fillna fails is because DF.select_dtypes returns a completely new dataframe which although forms a subset of the original DF, but is not really a part of it. It behaves as a completely new entity in itself. So any modifications done to it will not affect the DF it gets derived from.

Note that np.number selects all numeric type.

like image 54
Nickil Maveli Avatar answered Nov 15 '22 05:11

Nickil Maveli


Your pandas.DataFrame.select_dtypes approach is good; you've just got to cross the finish line:

>>> df = pd.DataFrame({'A': [np.nan, 'string', 'string', 'more string'], 'B': [np.nan, np.nan, 3, 4], 'C': [4, np.nan, 5, 6]})
>>> df
             A    B    C
0          NaN  NaN  4.0
1       string  NaN  NaN
2       string  3.0  5.0
3  more string  4.0  6.0

Don't try to perform the in-place fillna here (there's a time and place for inplace=True, but here is not one). You're right in that what's returned by select_dtypes is basically a view. Create a new dataframe called filled and join the filled (or "fixed") columns back with your original data:

>>> filled = df.select_dtypes(include=['int', 'float']).fillna(0)
>>> filled
     B    C
0  0.0  4.0
1  0.0  0.0
2  3.0  5.0
3  4.0  6.0
>>> df = df.join(filled, rsuffix='_filled')
>>> df
             A    B    C  B_filled  C_filled
0          NaN  NaN  4.0       0.0       4.0
1       string  NaN  NaN       0.0       0.0
2       string  3.0  5.0       3.0       5.0
3  more string  4.0  6.0       4.0       6.0

Then you can drop whatever original columns you had to keep only the "filled" ones:

>>> df.drop([x[:x.find('_filled')] for x in df.columns if '_filled' in x], axis=1, inplace=True)
>>> df
             A  B_filled  C_filled
0          NaN       0.0       4.0
1       string       0.0       0.0
2       string       3.0       5.0
3  more string       4.0       6.0
like image 43
blacksite Avatar answered Nov 15 '22 05:11

blacksite