Replacing Pandas or Numpy Nan with a None to use with MysqlDB

People also ask

Is none same as NaN in Pandas?

In Pandas missing data is represented by two value: None: None is a Python singleton object that is often used for missing data in Python code. NaN : NaN (an acronym for Not a Number), is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation.

How do I replace NaN with NP NaN?

In NumPy, to replace missing values NaN ( np. nan ) in ndarray with other numbers, use np. nan_to_num() or np. isnan() .

@bogatron has it right, you can use where, it's worth noting that you can do this natively in pandas:

df1 = df.where(pd.notnull(df), None)

Note: this changes the dtype of all columns to object.

Example:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]: 
    0
0   1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]: 
      0
0     1
1  None

Note: what you cannot do recast the DataFrames dtype to allow all datatypes types, using astype, and then the DataFrame fillna method:

df1 = df.astype(object).replace(np.nan, 'None')

Unfortunately neither this, nor using replace, works with None see this (closed) issue.

As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.

However, in this specific case it seems you do (at least at the time of this answer).

df = df.replace({np.nan: None})

Credit goes to this guy here on this Github issue.

You can replace nan with None in your numpy array:

>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>

After stumbling around, this worked for me:

df = df.astype(object).where(pd.notnull(df),None)

Another addition: be careful when replacing multiples and converting the type of the column back from object to float. If you want to be certain that your None's won't flip back to np.NaN's apply @andy-hayden's suggestion with using pd.where. Illustration of how replace can still go 'wrong':

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})

In [4]: df
Out[4]:
     a
0  1.0
1  NaN
2  inf

In [5]: df.replace({np.NAN: None})
Out[5]:
      a
0     1
1  None
2   inf

In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
     a
0  1.0
1  NaN
2  NaN

In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
     a
0  1.0
1  NaN
2  NaN

Just an addition to @Andy Hayden's answer:

Since DataFrame.mask is the opposite twin of DataFrame.where, they have the exactly same signature but with opposite meaning:

DataFrame.where is useful for Replacing values where the condition is False.
DataFrame.mask is used for Replacing values where the condition is True.

So in this question, using df.mask(df.isna(), other=None, inplace=True) might be more intuitive.

Quite old, yet I stumbled upon the very same issue. Try doing this:

df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)

Related questions
                            
                                Mocking python function based on input arguments
                            
                                How to embed HTML into IPython output?
                            
                                Iterate over object attributes in python [duplicate]
                            
                                Numpy - add row to array
                            
                                How can I time a code segment for testing performance with Pythons timeit?
                            
                                How to get first element in a list of tuples?
                            
                                AttributeError: 'datetime' module has no attribute 'strptime'
                            
                                How to skip iterations in a loop?
                            
                                How to dynamically load a Python class
                            
                                How to access a dictionary element in a Django template?
                            
                                Plot a horizontal line using matplotlib
                            
                                Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org
                            
                                SFTP in Python? (platform independent)
                            
                                What is the '@=' symbol for in Python?
                            
                                Getting individual colors from a color map in matplotlib
                            
                                What is the best way to exit a function (which has no return value) in python before the function ends (e.g. a check fails)?
                            
                                ImportError: No module named matplotlib.pyplot
                            
                                How to run an .ipynb Jupyter Notebook from terminal?
                            
                                Display a decimal in scientific notation
                            
                                Common xlabel/ylabel for matplotlib subplots

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replacing Pandas or Numpy Nan with a None to use with MysqlDB

Tags:

python

pandas

mysql-python

numpy

People also ask

Recent Activity

Donate For Us