Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use None instead of np.nan for null values in pandas DataFrame

I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.

In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.

df = pd.DataFrame(index=[0], columns=range(5)) df.iloc[0] = [1, 'two', np.nan, 3, 4]  

I can't do:

>>> df.fillna(None) ValueError: must specify a fill method or value 

nor:

>>> df[df.isnull()] = None TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value 

nor:

>>> df.replace(np.nan, None) TypeError: cannot replace [nan] with method pad on a DataFrame 

I used to have a DataFrame with only string values, so I could do:

>>> df[df == ""] = None 

which worked. But now that I have mixed datatypes, it's a no go.

For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I'm using np.isnan or pd.isnull everywhere?

like image 825
J Jones Avatar asked Sep 01 '16 19:09

J Jones


People also ask

How do you replace NaN with None in pandas?

Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.

Is None same as NaN in pandas?

As summary, NaN and None are different data types in Python. However, when it comes to missing values detection and elimination, pandas. DataFrame treats NaN and None similarly. To detect missing values, df.

Is None the same as NP NaN?

NaN can be used as a numerical value on mathematical operations, while None cannot (or at least shouldn't). NaN is a numeric value, as defined in IEEE 754 floating-point standard. None is an internal Python type ( NoneType ) and would be more like "inexistent" or "empty" than "numerically invalid" in this context.

Is None null in pandas?

notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.


1 Answers

Use pd.DataFrame.where
Uses df value when condition is met, otherwise uses None

df.where(df.notnull(), None) 

enter image description here

like image 164
piRSquared Avatar answered Oct 10 '22 02:10

piRSquared