I am trying to select the rows of df
where the column label
has value None
. (It's value None
I obtained from another function, not NaN
)
Why does df[df['label'].isnull()]
return the rows I wanted,
but df[df['label'] == None]
returns Empty DataFrame
Columns: [path, fanId, label, gain, order]
Index: []
?
As the comment above states, missing data in pandas
is represented by a NaN, where NaN is a numerical value, i.e float type. However None is a Python NoneType
, so NaN will not be equivalent to None.
In [27]: np.nan == None
Out[27]: False
In this Github thread they discuss further, noting:
This was done quite a while ago to make the behavior of nulls consistent, in that they don't compare equal. This puts None and np.nan on an equal (though not-consistent with python, BUT consistent with numpy) footing.
This means when you do df[df['label'] == None]
, you're going elementwise
checking if np.nan == np.nan
, which we know is false.
In [63]: np.nan == np.nan
Out[63]: False
Additionally you should not do df[df['label'] == None]
when you're applying Boolean indexing, using ==
for a NoneType
is not best practice as PEP8 mentions:
Comparisons to singletons like None should always be done with
is
oris not
, never the equality operators.
For example you could do tst.value.apply(lambda x: x is None)
, which yields the same outcome as .isnull()
, illustrating how pandas
treats these as NaNs. Note this is for the below tst
dataframe example, where tst.value.dtypes
is an object
of which I've explicitly specified the NoneType
elements.
There is a nice example in the pandas
docs which illustrate this and it's effect.
For example if you have two columns, one of type float
and the other object
you can see how pandas deals with the None
type in a nice way, notice for float
it is using NaN.
In [32]: tst = pd.DataFrame({"label" : [1, 2, None, 3, None], "value" : ["A", "B", None, "C", None]})
Out[39]:
label value
0 1.0 A
1 2.0 B
2 NaN None
3 3.0 C
4 NaN None
In [51]: type(tst.value[2])
Out[51]: NoneType
In [52]: type(tst.label[2])
Out[52]: numpy.float64
This post explains the difference between NaN and None really well, would definitely take a look at this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With