Say I have a dataframe df
with a column value
holding some float values and some NaN
. How can I get the part of the dataframe where we have NaN
using the query syntax?
The following, for example, does not work:
df.query( '(value < 10) or (value == NaN)' )
I get name NaN is not defined
(same for df.query('value ==NaN')
)
Generally speaking, is there any way to use numpy names in query, such as inf
, nan
, pi
, e
, etc.?
This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.
Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.
In general, you could use @local_variable_name
, so something like
>>> pi = np.pi; nan = np.nan
>>> df = pd.DataFrame({"value": [3,4,9,10,11,np.nan,12]})
>>> df.query("(value < 10) and (value > @pi)")
value
1 4
2 9
would work, but nan
isn't equal to itself, so value == NaN
will always be false. One way to hack around this is to use that fact, and use value != value
as an isnan
check. We have
>>> df.query("(value < 10) or (value == @nan)")
value
0 3
1 4
2 9
but
>>> df.query("(value < 10) or (value != value)")
value
0 3
1 4
2 9
5 NaN
According to this answer you can use:
df.query('value < 10 | value.isnull()', engine='python')
I verified that it works.
For rows where value
is not null
df.query("value == value")
For rows where value
is null
df.query("value != value")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With