Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values. My dataframe, which I load from a CSV file using <code>read.csv</code>, has a column <code>comments</code>, which is empty most of the time. The column <code>marked_results.comments</code> looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good: <pre class="prettyprint"><code>0 VP 1 VP 2 VP 3 TEST 4 NaN 5 NaN .... </code></pre> Now I try to drop those entries, only this works: <ul> <li><code>marked_results.comments.isnull()</code></li> </ul> All these don't work: <ul> <li> <code>marked_results.comments.dropna()</code> only gives the same column, nothing gets dropped, confusing.</li> <li> <code>marked_results.comments == NaN</code> only gives a series of all <code>False</code>s. Nothing was NaNs... confusing.</li> <li>likewise <code>marked_results.comments == nan</code> </li> </ul> <hr> I also tried: <pre class="prettyprint"><code>comments_values = marked_results.comments.unique() array(['VP', 'TEST', nan], dtype=object) # Ah, gotya! so now ive tried: marked_results.comments == comments_values[2] # but still all the results are Falses!!! </code></pre>

You need to test <code>NaN</code> with <code>math.isnan()</code> function (Or <code>numpy.isnan</code>). NaNs cannot be checked with the equality operator. <pre class="prettyprint"><code>>>> a = float('NaN') >>> a nan >>> a == 'NaN' False >>> isnan(a) True >>> a == float('NaN') False </code></pre> Help Function -> <pre class="prettyprint"><code>isnan(...) isnan(x) -> bool Check if float x is not a number (NaN). </code></pre>

Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?

Tags:

python

pandas

dataframe

nan

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.

My dataframe, which I load from a CSV file using read.csv, has a column comments, which is empty most of the time.

The column marked_results.comments looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

Now I try to drop those entries, only this works:

marked_results.comments.isnull()

All these don't work:

marked_results.comments.dropna() only gives the same column, nothing gets dropped, confusing.
marked_results.comments == NaN only gives a series of all Falses. Nothing was NaNs... confusing.
likewise marked_results.comments == nan

I also tried:

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!

297

asked Jul 31 '13 12:07

idoda

2 Answers

You should use isnull and notnull to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.

Using the Series method dropna on a column won't affect the original dataframe, but do what you want:

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

The dropna DataFrame method has a subset argument (to drop rows which have NaNs in specific columns):

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])

124

answered Nov 09 '22 07:11

Andy Hayden

You need to test NaN with math.isnan() function (Or numpy.isnan). NaNs cannot be checked with the equality operator.

>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False

Help Function ->

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).

answered Nov 09 '22 08:11

Sukrit Kalra

Related questions
                            
                                Calculating computational time and memory for a code in python
                            
                                From tick by tick data to candlestick
                            
                                Python : How to add month to December 2012 and get January 2013?
                            
                                Fast transliteration for Arabic Text with Python
                            
                                split items in list
                            
                                Unit testing a script that opens a file
                            
                                cx_oracle and python 2.7 [duplicate]
                            
                                Reportlab. Floating Text with two Columns
                            
                                How do I stop this cascading delete from happening in Django?
                            
                                Using Python to plot points on a World Map [closed]
                            
                                String formatting in Python: can I use %s for all types?
                            
                                in Python how to convert number to float in a mixed list [duplicate]
                            
                                how to merge 2 list as a key value pair in python [duplicate]
                            
                                What is a better Tkinter geometry manager than .grid()
                            
                                Please explain these Python Fetch types
                            
                                Simultaneous .replace functionality
                            
                                How to create a PixBuf from file with Gdk3?
                            
                                python round leaving a trailing 0 [duplicate]
                            
                                How to find out the date of the last Saturday in Linux shell script or python?
                            
                                Save a file depending on the user Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With