I was experimenting several use cases for the pandas query() method, and tried one argument that threw an exception, but yet caused an unwanted modification to the data in my DataFrame. <pre class="prettyprint"><code>In [549]: syn_fmax_sort Out[549]: build_number name fmax 0 390 adpcm 143.45 1 390 aes 309.60 2 390 dfadd 241.02 3 390 dfdiv 10.80 .... 211 413 dfmul 215.98 212 413 dfsin 11.94 213 413 gsm 194.70 214 413 jpeg 197.75 215 413 mips 202.39 216 413 mpeg2 291.29 217 413 sha 243.19 [218 rows x 3 columns] </code></pre> So I wanted to use <code>query()</code> to just take out a subset of this dataframe that contains all the <code>build_number</code> of 392, so I tried: <pre class="prettyprint"><code>In [550]: syn_fmax_sort.query('build_number = 392') </code></pre> That threw a <code>ValueError: cannot label index with a null key</code> exception, but not only that, it returned back the full dataframe to me,and caused all the <code>build_number</code> to be set to 392: <pre class="prettyprint"><code>In [551]: syn_fmax_sort Out[551]: build_number name fmax 0 392 adpcm 143.45 1 392 aes 309.60 2 392 dfadd 241.02 3 392 dfdiv 10.80 .... 211 392 dfmul 215.98 212 392 dfsin 11.94 213 392 gsm 194.70 214 392 jpeg 197.75 215 392 mips 202.39 216 392 mpeg2 291.29 217 392 sha 243.19 [218 rows x 3 columns] </code></pre> However, I have since figured out how to get value 392 only, if I used <code>syn_fmax_sort.query('391 < build_number < 393')</code>, it works/ So my question is: Is the behavior that I observed above when I queried the dataframe wrongly due to a bug in the <code>query()</code> method?

It looks like you had a typo, you probably wanted to use <code>==</code> rather than <code>=</code>, a simple example shows the same problem: <pre class="prettyprint"><code>In [286]: df = pd.DataFrame({'a':np.arange(5)}) df Out[286]: a 0 0 1 1 2 2 3 3 4 4 In [287]: df.query('a = 3') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-287-41cfa0572737> in <module>() ----> 1 df.query('a = 3') C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py in query(self, expr, **kwargs) 1923 # when res is multi-dimensional loc raises, but this is sometimes a 1924 # valid query -> 1925 return self[res] 1926 1927 def eval(self, expr, **kwargs): C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 1778 return self._getitem_multilevel(key) 1779 else: -> 1780 return self._getitem_column(key) 1781 1782 def _getitem_column(self, key): C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key) 1785 # get column 1786 if self.columns.is_unique: -> 1787 return self._get_item_cache(key) 1788 1789 # duplicate columns & possible reduce dimensionaility C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item) 1066 res = cache.get(item) 1067 if res is None: -> 1068 values = self._data.get(item) 1069 res = self._box_item_values(item, values) 1070 cache[item] = res C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath) 2856 loc = indexer.item() 2857 else: -> 2858 raise ValueError("cannot label index with a null key") 2859 2860 return self.iget(loc, fastpath=fastpath) ValueError: cannot label index with a null key </code></pre> It looks like internally it's trying to build an index using your query and it then checks the length and as it's 0 it raises a <code>ValueError</code> it probably should be <code>KeyError</code>, I don't know how it's evaluated your query but perhaps it's unsupported at the moment the ability to assign values to columns.

Bug in pandas query() method?

Q: What does query do in pandas?

Pandas DataFrame query() Method The query() method allows you to query the DataFrame. The query() method takes a query expression as a string parameter, which has to evaluate to either True of False. It returns the DataFrame where the result is True according to the query expression.

Q: What does size () do in pandas?

The size property returns the number of elements in the DataFrame. The number of elements is the number of rows * the number of columns.

Q: Does pandas use lazy evaluation?

Flesh-and-blood are famously lazy. Pandas the package, however, uses Eager Evaluation.

Q: Which is faster than pandas?

Using datatable, we can read in the CSV file in ~20 seconds. Reading the same file using pandas takes almost 76 seconds! Next, we can also sort faster with datatable. In datatable, this takes ~0.002 seconds, but takes ~0.934 seconds in pandas.

Tags:

python

pandas

dataframe

I was experimenting several use cases for the pandas query() method, and tried one argument that threw an exception, but yet caused an unwanted modification to the data in my DataFrame.

In [549]: syn_fmax_sort
Out[549]: 
     build_number      name    fmax
0             390     adpcm  143.45
1             390       aes  309.60
2             390     dfadd  241.02
3             390     dfdiv   10.80
....
211           413     dfmul  215.98
212           413     dfsin   11.94
213           413       gsm  194.70
214           413      jpeg  197.75
215           413      mips  202.39
216           413     mpeg2  291.29
217           413       sha  243.19

[218 rows x 3 columns]

So I wanted to use query() to just take out a subset of this dataframe that contains all the build_number of 392, so I tried:

In [550]: syn_fmax_sort.query('build_number = 392')

That threw a ValueError: cannot label index with a null key exception, but not only that, it returned back the full dataframe to me,and caused all the build_number to be set to 392:

In [551]: syn_fmax_sort
Out[551]: 
     build_number      name    fmax
0             392     adpcm  143.45
1             392       aes  309.60
2             392     dfadd  241.02
3             392     dfdiv   10.80
....
211           392     dfmul  215.98
212           392     dfsin   11.94
213           392       gsm  194.70
214           392      jpeg  197.75
215           392      mips  202.39
216           392     mpeg2  291.29
217           392       sha  243.19

[218 rows x 3 columns]

However, I have since figured out how to get value 392 only, if I used syn_fmax_sort.query('391 < build_number < 393'), it works/

So my question is: Is the behavior that I observed above when I queried the dataframe wrongly due to a bug in the query() method?

936

asked Feb 25 '15 08:02

AKKO

1 Answers

It looks like you had a typo, you probably wanted to use == rather than =, a simple example shows the same problem:

In [286]:

df = pd.DataFrame({'a':np.arange(5)})
df
Out[286]:
   a
0  0
1  1
2  2
3  3
4  4
In [287]:

df.query('a = 3')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-287-41cfa0572737> in <module>()
----> 1 df.query('a = 3')

C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py in query(self, expr, **kwargs)
   1923             # when res is multi-dimensional loc raises, but this is sometimes a
   1924             # valid query
-> 1925             return self[res]
   1926 
   1927     def eval(self, expr, **kwargs):

C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   1778             return self._getitem_multilevel(key)
   1779         else:
-> 1780             return self._getitem_column(key)
   1781 
   1782     def _getitem_column(self, key):

C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   1785         # get column
   1786         if self.columns.is_unique:
-> 1787             return self._get_item_cache(key)
   1788 
   1789         # duplicate columns & possible reduce dimensionaility

C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1066         res = cache.get(item)
   1067         if res is None:
-> 1068             values = self._data.get(item)
   1069             res = self._box_item_values(item, values)
   1070             cache[item] = res

C:\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   2856                         loc = indexer.item()
   2857                     else:
-> 2858                         raise ValueError("cannot label index with a null key")
   2859 
   2860             return self.iget(loc, fastpath=fastpath)

ValueError: cannot label index with a null key

It looks like internally it's trying to build an index using your query and it then checks the length and as it's 0 it raises a ValueError it probably should be KeyError, I don't know how it's evaluated your query but perhaps it's unsupported at the moment the ability to assign values to columns.

170

answered Oct 10 '22 06:10

EdChum

Related questions
                            
                                Matplotlib: Getting subplots to fill figure
                            
                                convert latitude and longitude to x and y grid system using python
                            
                                Freeze cells in excel using xlwt
                            
                                Sending multiple .CSV files to .ZIP without storing to disk in Python
                            
                                What do the zeros in python function bytecode mean?
                            
                                Add a vertical slider with matplotlib
                            
                                How to split a dataframe by unique groups and save to a csv
                            
                                How to Stack Data Frames on top of one another (Pandas,Python3)
                            
                                Latin hypercube sampling with python
                            
                                Django Model OneToOneField without creating additional _id database column
                            
                                Access refused for user rabbitmq & celery
                            
                                Can a cookie be set when using jsonify?
                            
                                Pass pandas dataframe into class
                            
                                Making a set from dictionary values
                            
                                get list of selected objects as string Blender python
                            
                                Python selenium : Explicitly wait for one of two elements to be loaded
                            
                                How can I get the list of names used in a formatting string?
                            
                                Prevent matplotlib statefulness
                            
                                combining two python dictionaries into one when the net values are not positive [duplicate]
                            
                                Using GPU despite setting CPU_Only, yielding unexpected keyword argument

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With