Pandas has both isna()
and isnull()
. I usually use isnull()
to detect missing values and have never met the case so that I had to use other than that.
So, when to use isna()
?
They both are same. As a best practice, always prefer to use isna() over isnull() . It is easy to remember what isna() is doing because when you look at numpy method np. isnan() , it checks NaN values.
Pandas DataFrame isnull() Method The isnull() method returns a DataFrame object where all the values are replaced with a Boolean value True for NULL values, and otherwise False.
Detect missing values for an array-like object. This function takes a scalar or array-like object and indicates whether values are missing ( NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).
Pandas treat None and NaN as essentially interchangeable for indicating missing or null values. To facilitate this convention, there are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame : isnull() notnull()
You can use the isna () method to identify the missing values. Because it is the original method implemented and isnull () is just an alias that internally calls the isna () method. To summarize, you’ve learned the difference between isnull () and isna ()methods in the pandas dataframe. You’ve also learned which method needs to be used.
Since isnull is an alias for isna, I would tend to prefer isna. Indeed, isna seems to be used more often than isnull. "There should be one—and preferably only one—obvious way to do it." Presumably same would apply to notna and notnull?
However, in python, pandas is built on top of numpy, which has neither na nor null values. Instead numpy has NaN values (which stands for "Not a Number"). Consequently, pandas also uses NaN values. To detect NaN values numpy uses np.isnan (). To detect NaN values pandas uses either .isna () or .isnull ().
In R, the na values and null values are different types. Hence, there are two different methods to check na and null. That’s why pandas have two method names. On the other hand, in Python pandas is built on top of NumPy which doesn’t have na or null values. It uses Np.NaN values to denote the missing values.
isnull
is an alias for isna
. Literally in the code source of pandas:
isnull = isna
Indeed:
>>> pd.isnull <function isna at 0x7fb4c5cefc80>
So I would recommend using isna
.
The documentation for both is literally identical.
pandas.isna() : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.isna.html#pandas.isna
pandas.isnull() : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.isnull.html#pandas.isnull
In here, it even says DataFrame.isnull is an alias of isna in See also section.
pandas.DataFrame.isnull(): https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.isnull.html#pandas.DataFrame.isnull
Therefore, they must be the same thing, like np.nan, np.NaN, np.NAN.
They both are same. As a best practice, always prefer to use isna()
over isnull()
.
It is easy to remember what isna()
is doing because when you look at numpy method np.isnan()
, it checks NaN
values. In pandas there are other similar method names like dropna()
, fillna()
that handles missing values and it always helps to remember easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With