I have a DateFrame with a mixture of string, and float rows. The float rows are all still whole numbers and were only changed to floats because their were missing values. I want to fill in all the NaN rows that are numbers with zero while leaving the NaN in columns that are strings. Here is what I have currently. <pre class="prettyprint"><code>df.select_dtypes(include=['int', 'float']).fillna(0, inplace=True) </code></pre> This doesn't work and I think it is because .select_dtypes() returns a view of the DataFrame so the .fillna() doesn't work. Is there a method similar to this to fill all the NaNs on only the float rows.

Use either <code>DF.combine_first</code> (does not act <code>inplace</code>): <pre class="prettyprint"><code>df.combine_first(df.select_dtypes(include=[np.number]).fillna(0)) </code></pre> or <code>DF.update</code> (modifies <code>inplace</code>): <pre class="prettyprint"><code>df.update(df.select_dtypes(include=[np.number]).fillna(0)) </code></pre> The reason why <code>fillna</code> fails is because <code>DF.select_dtypes</code> returns a completely new dataframe which although forms a subset of the original <code>DF</code>, but is not really a part of it. It behaves as a completely new entity in itself. So any modifications done to it will not affect the <code>DF</code> it gets derived from. Note that <code>np.number</code> selects all numeric type.

I am trying to fill all NaN values in rows with number data types to zero in pandas

Tags:

python

pandas

missing-data

I have a DateFrame with a mixture of string, and float rows. The float rows are all still whole numbers and were only changed to floats because their were missing values. I want to fill in all the NaN rows that are numbers with zero while leaving the NaN in columns that are strings. Here is what I have currently.

df.select_dtypes(include=['int', 'float']).fillna(0, inplace=True)

This doesn't work and I think it is because .select_dtypes() returns a view of the DataFrame so the .fillna() doesn't work. Is there a method similar to this to fill all the NaNs on only the float rows.

586

asked Mar 24 '17 15:03

Don Quixote

2 Answers

Use either DF.combine_first (does not act inplace):

df.combine_first(df.select_dtypes(include=[np.number]).fillna(0))

or DF.update (modifies inplace):

df.update(df.select_dtypes(include=[np.number]).fillna(0))

The reason why fillna fails is because DF.select_dtypes returns a completely new dataframe which although forms a subset of the original DF, but is not really a part of it. It behaves as a completely new entity in itself. So any modifications done to it will not affect the DF it gets derived from.

Note that np.number selects all numeric type.

answered Nov 15 '22 05:11

Nickil Maveli

Your pandas.DataFrame.select_dtypes approach is good; you've just got to cross the finish line:

>>> df = pd.DataFrame({'A': [np.nan, 'string', 'string', 'more string'], 'B': [np.nan, np.nan, 3, 4], 'C': [4, np.nan, 5, 6]})
>>> df
             A    B    C
0          NaN  NaN  4.0
1       string  NaN  NaN
2       string  3.0  5.0
3  more string  4.0  6.0

Don't try to perform the in-place fillna here (there's a time and place for inplace=True, but here is not one). You're right in that what's returned by select_dtypes is basically a view. Create a new dataframe called filled and join the filled (or "fixed") columns back with your original data:

>>> filled = df.select_dtypes(include=['int', 'float']).fillna(0)
>>> filled
     B    C
0  0.0  4.0
1  0.0  0.0
2  3.0  5.0
3  4.0  6.0
>>> df = df.join(filled, rsuffix='_filled')
>>> df
             A    B    C  B_filled  C_filled
0          NaN  NaN  4.0       0.0       4.0
1       string  NaN  NaN       0.0       0.0
2       string  3.0  5.0       3.0       5.0
3  more string  4.0  6.0       4.0       6.0

Then you can drop whatever original columns you had to keep only the "filled" ones:

>>> df.drop([x[:x.find('_filled')] for x in df.columns if '_filled' in x], axis=1, inplace=True)
>>> df
             A  B_filled  C_filled
0          NaN       0.0       4.0
1       string       0.0       0.0
2       string       3.0       5.0
3  more string       4.0       6.0

answered Nov 15 '22 05:11

blacksite

Related questions
                            
                                Joining multiprocessing queue takes a long time
                            
                                GaussianMixture initialization using component parameters - sklearn
                            
                                PyCharm remote interpreter and Tensorflow -> can not import Cudart.so
                            
                                python opencv error in Emacs when running cv2.Canny()
                            
                                Installing QuantLib in Anaconda on the Spyder Editor (Windows)
                            
                                Module has no attribute error in python3
                            
                                How do you Merge 2 Series in Pandas
                            
                                How to use OrderedDict as an input in yaml.dump or yaml.safe_dump?
                            
                                Keras + TensorFlow: “module 'tensorflow' has no attribute 'merge_all_summaries''”
                            
                                Python Requests - Mock status code and response
                            
                                Setting matplotlib colorbar range (larger range than the values plotted)
                            
                                How can we access PY_SSIZE_T_MAX value from python?
                            
                                Force TkInter Scale slider to snap to mouse
                            
                                How can I use Cython well to solve a differential equation faster?
                            
                                Color a specific bar in histogram using python
                            
                                Fuzzy Wuzzy String Matching on 2 Large Data Sets Based on a Condition - python
                            
                                Filter object error in Python 3
                            
                                Import data from file with different row length using Pandas
                            
                                NameError: name 'request' is not defined, in Django forms
                            
                                I want to read csv file using python27, but there is an error like" TypeError: 'encoding' is an invalid keyword argument for this function"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With