What is the difference between NaN and None?

Tags:

I am reading two columns of a csv file using pandas readcsv() and then assigning the values to a dictionary. The columns contain strings of numbers and letters. Occasionally there are cases where a cell is empty. In my opinion, the value read to that dictionary entry should be None but instead nan is assigned. Surely None is more descriptive of an empty cell as it has a null value, whereas nan just says that the value read is not a number.

Is my understanding correct, what IS the difference between None and nan? Why is nan assigned instead of None?

Also, my dictionary check for any empty cells has been using numpy.isnan():

for k, v in my_dict.iteritems():
    if np.isnan(v):

But this gives me an error saying that I cannot use this check for v. I guess it is because an integer or float variable, not a string is meant to be used. If this is true, how can I check v for an "empty cell"/nan case?

233

asked Jul 08 '13 19:07

user1083734

1 Answers

NaN is used as a placeholder for missing data consistently in pandas, consistency is good. I usually read/translate NaN as "missing". Also see the 'working with missing data' section in the docs.

Wes writes in the docs 'choice of NA-representation':

After years of production use [NaN] has proven, at least in my opinion, to be the best decision given the state of affairs in NumPy and Python in general. The special value NaN (Not-A-Number) is used everywhere as the NA value, and there are API functions isnull and notnull which can be used across the dtypes to detect NA values.
...
Thus, I have chosen the Pythonic “practicality beats purity” approach and traded integer NA capability for a much simpler approach of using a special value in float and object arrays to denote NA, and promoting integer arrays to floating when NAs must be introduced.

Note: the "gotcha" that integer Series containing missing data are upcast to floats.

In my opinion the main reason to use NaN (over None) is that it can be stored with numpy's float64 dtype, rather than the less efficient object dtype, see NA type promotions.

#  without forcing dtype it changes None to NaN!
s_bad = pd.Series([1, None], dtype=object)
s_good = pd.Series([1, np.nan])

In [13]: s_bad.dtype
Out[13]: dtype('O')

In [14]: s_good.dtype
Out[14]: dtype('float64')

Jeff comments (below) on this:

np.nan allows for vectorized operations; its a float value, while None, by definition, forces object type, which basically disables all efficiency in numpy.

So repeat 3 times fast: object==bad, float==good

Saying that, many operations may still work just as well with None vs NaN (but perhaps are not supported i.e. they may sometimes give surprising results):

In [15]: s_bad.sum()
Out[15]: 1

In [16]: s_good.sum()
Out[16]: 1.0

To answer the second question:
You should be using pd.isnull and pd.notnull to test for missing data (NaN).

answered Oct 01 '22 21:10

Andy Hayden

Related questions
                            
                                How to un-escape a backslash-escaped string?
                            
                                Check whether a path is valid in Python without creating a file at the path's target
                            
                                String slugification in Python
                            
                                How to compare times in Python?
                            
                                Adding dictionaries together, Python [duplicate]
                            
                                Python: Best way to add to sys.path relative to the current running script
                            
                                Get Element value with minidom with Python
                            
                                How to create a new database using SQLAlchemy?
                            
                                "ValueError: zero length field name in format" error in Python 3.0,3.1,3.2
                            
                                List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?
                            
                                Python os.path.join() on a list
                            
                                When import docx in python3.3 I have error ImportError: No module named 'exceptions'
                            
                                Updating Python on Mac
                            
                                Pandas How to filter a Series
                            
                                Django: Calculate the Sum of the column values through query
                            
                                Converting a RGB color tuple to a six digit code
                            
                                How to "log in" to a website using Python's Requests module?
                            
                                Random hash in Python
                            
                                What is the good python3 equivalent for auto tuple unpacking in lambda?
                            
                                Calling a base class's classmethod in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between NaN and None?

Tags:

python

pandas

nan

numpy

user1083734

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us