How to set a cell to NaN in a pandas dataframe

People also ask

How do I create a NaN column in pandas?

Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np. nan.

How do you define null values in pandas?

In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.

just use replace:

In [106]:
df.replace('N/A',np.NaN)

Out[106]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

What you're trying is called chain indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

You can use loc to ensure you operate on the original dF:

In [108]:
df.loc[df['y'] == 'N/A','y'] = np.nan
df

Out[108]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

While using replace seems to solve the problem, I would like to propose an alternative. Problem with mix of numeric and some string values in the column not to have strings replaced with np.nan, but to make whole column proper. I would bet that original column most likely is of an object type

Name: y, dtype: object

What you really need is to make it a numeric column (it will have proper type and would be quite faster), with all non-numeric values replaced by NaN.

Thus, good conversion code would be

pd.to_numeric(df['y'], errors='coerce')

Specify errors='coerce' to force strings that can't be parsed to a numeric value to become NaN. Column type would be

Name: y, dtype: float64

You can use replace:

df['y'] = df['y'].replace({'N/A': np.nan})

Also be aware of the inplace parameter for replace. You can do something like:

df.replace({'N/A': np.nan}, inplace=True)

This will replace all instances in the df without creating a copy.

Similarly, if you run into other types of unknown values such as empty string or None value:

df['y'] = df['y'].replace({'': np.nan})

df['y'] = df['y'].replace({None: np.nan})

Reference: Pandas Latest - Replace

Most replies here above need to import an external module: import numpy as np

There is a built-in solution into pandas itself: pd.NA, to use like this:

df.replace('N/A', pd.NA)

As of pandas 1.0.0, you no longer need to use numpy to create null values in your dataframe. Instead you can just use pandas.NA (which is of type pandas._libs.missing.NAType), so it will be treated as null within the dataframe but will not be null outside dataframe context.

df.loc[df.y == 'N/A',['y']] = np.nan

This solve your problem. With the double [], you are working on a copy of the DataFrame. You have to specify exact location in one call to be able to modify it.

Related questions
                            
                                Fill between two vertical lines in matplotlib [duplicate]
                            
                                Reading a huge .csv file
                            
                                How to convert a date string to different format [duplicate]
                            
                                Removing first x characters from string?
                            
                                What does model.train() do in PyTorch?
                            
                                Checking a Python module version at runtime
                            
                                Python: Bind an Unbound Method?
                            
                                How to trace the path in a Breadth-First Search?
                            
                                Hashing a file in Python
                            
                                python pandas dataframe columns convert to dict key and value
                            
                                Is there a Python function to determine which quarter of the year a date is in?
                            
                                How to check if a file is a valid image file?
                            
                                Drop rows with all zeros in pandas data frame
                            
                                Is there an expression for an infinite iterator?
                            
                                How to run code when a class is subclassed? [duplicate]
                            
                                What is the difference between .py and .pyc files? [duplicate]
                            
                                Why do you have to call .items() when iterating over a dictionary in Python?
                            
                                Generate temporary file names without creating actual file in Python
                            
                                Are for-loops in pandas really bad? When should I care?
                            
                                Equivalent C++ to Python generator pattern

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to set a cell to NaN in a pandas dataframe

Tags:

python

pandas

nan

People also ask

Recent Activity

Donate For Us