I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.
In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.
df = pd.DataFrame(index=[0], columns=range(5)) df.iloc[0] = [1, 'two', np.nan, 3, 4]
I can't do:
>>> df.fillna(None) ValueError: must specify a fill method or value
nor:
>>> df[df.isnull()] = None TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
nor:
>>> df.replace(np.nan, None) TypeError: cannot replace [nan] with method pad on a DataFrame
I used to have a DataFrame with only string values, so I could do:
>>> df[df == ""] = None
which worked. But now that I have mixed datatypes, it's a no go.
For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I'm using np.isnan or pd.isnull everywhere?
Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.
As summary, NaN and None are different data types in Python. However, when it comes to missing values detection and elimination, pandas. DataFrame treats NaN and None similarly. To detect missing values, df.
NaN can be used as a numerical value on mathematical operations, while None cannot (or at least shouldn't). NaN is a numeric value, as defined in IEEE 754 floating-point standard. None is an internal Python type ( NoneType ) and would be more like "inexistent" or "empty" than "numerically invalid" in this context.
notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.
Use pd.DataFrame.where
Uses df
value when condition is met, otherwise uses None
df.where(df.notnull(), None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With