Say I have a dataframe <code>df</code> with a column <code>value</code> holding some float values and some <code>NaN</code>. How can I get the part of the dataframe where we have <code>NaN</code> using the query syntax? The following, for example, does not work: <pre class="prettyprint"><code>df.query( '(value < 10) or (value == NaN)' ) </code></pre> I get <code>name NaN is not defined</code> (same for <code>df.query('value ==NaN')</code>) Generally speaking, is there any way to use numpy names in query, such as <code>inf</code>, <code>nan</code>, <code>pi</code>, <code>e</code>, etc.?

In general, you could use <code>@local_variable_name</code>, so something like <pre class="prettyprint"><code>>>> pi = np.pi; nan = np.nan >>> df = pd.DataFrame({"value": [3,4,9,10,11,np.nan,12]}) >>> df.query("(value < 10) and (value > @pi)") value 1 4 2 9 </code></pre> would work, but <code>nan</code> isn't equal to itself, so <code>value == NaN</code> will always be false. One way to hack around this is to use that fact, and use <code>value != value</code> as an <code>isnan</code> check. We have <pre class="prettyprint"><code>>>> df.query("(value < 10) or (value == @nan)") value 0 3 1 4 2 9 </code></pre> but <pre class="prettyprint"><code>>>> df.query("(value < 10) or (value != value)") value 0 3 1 4 2 9 5 NaN </code></pre>

According to this answer you can use: <pre class="prettyprint"><code>df.query('value < 10 | value.isnull()', engine='python') </code></pre> I verified that it works.

For rows where <code>value</code> is not null <pre class="prettyprint"><code>df.query("value == value") </code></pre> For rows where <code>value</code> is null <pre class="prettyprint"><code>df.query("value != value") </code></pre>

Querying for NaN and other names in Pandas

Tags:

python

pandas

Say I have a dataframe df with a column value holding some float values and some NaN. How can I get the part of the dataframe where we have NaN using the query syntax?

The following, for example, does not work:

df.query( '(value < 10) or (value == NaN)' )

I get name NaN is not defined (same for df.query('value ==NaN'))

Generally speaking, is there any way to use numpy names in query, such as inf, nan, pi, e, etc.?

436

asked Oct 23 '14 19:10

Amelio Vazquez-Reina

3 Answers

In general, you could use @local_variable_name, so something like

>>> pi = np.pi; nan = np.nan
>>> df = pd.DataFrame({"value": [3,4,9,10,11,np.nan,12]})
>>> df.query("(value < 10) and (value > @pi)")
   value
1      4
2      9

would work, but nan isn't equal to itself, so value == NaN will always be false. One way to hack around this is to use that fact, and use value != value as an isnan check. We have

>>> df.query("(value < 10) or (value == @nan)")
   value
0      3
1      4
2      9

but

>>> df.query("(value < 10) or (value != value)")
   value
0      3
1      4
2      9
5    NaN

answered Oct 20 '22 06:10

DSM

According to this answer you can use:

df.query('value < 10 | value.isnull()', engine='python')

I verified that it works.

answered Oct 20 '22 05:10

Eric Ness

For rows where value is not null

df.query("value == value")

For rows where value is null

df.query("value != value")

answered Oct 20 '22 06:10

as - if

Related questions
                            
                                What's the best way to return multiple values from a function?
                            
                                Using Python to execute a command on every file in a folder
                            
                                Python parse comma-separated number into int [duplicate]
                            
                                Getting the first non None value from list
                            
                                Unresolved attribute reference 'objects' for class '' in PyCharm
                            
                                How can I find the missing value more concisely?
                            
                                for loop in Python
                            
                                number of values in a list greater than a certain number
                            
                                How to add clickable links to a field in Django admin?
                            
                                How to add a new column to a CSV file?
                            
                                Is there any use for Bash scripting anymore? [closed]
                            
                                What's a faster operation, re.match/search or str.find?
                            
                                How do I reliably split a string in Python, when it may not contain the pattern, or all n elements?
                            
                                What is the most efficient way to get first and last line of a text file?
                            
                                Set value for particular cell in pandas DataFrame with iloc
                            
                                Combining Two Images with OpenCV
                            
                                Reading column names alone in a csv file
                            
                                Matplotlib scatter plot legend
                            
                                How can I change the host and port that the flask command uses?
                            
                                Python: Elegant way to check if at least one regex in list matches a string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With