Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Boolean indexing on multiple columns [duplicate]

despite there being at least two good tutorials on how to index a DataFrame in Python's pandas library, I still can't work out an elegant way of SELECTing on more than one column.

>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})
>>> d
   x  y
0  1  4
1  2  5
2  3  6
3  4  7
4  5  8
>>> d[d['x']>2] # This works fine
   x  y
2  3  6
3  4  7
4  5  8
>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have found (what I think is) a rather inelegant way of doing it, like this

>>> d[d['x']>2][d['y']>7]

But it's not pretty, and it scores fairly low for readability (I think).

Is there a better, more Python-tastic way?

like image 258
LondonRob Avatar asked Jun 20 '13 14:06

LondonRob


People also ask

How do you find duplicate rows in pandas based on multiple columns?

Find Duplicate Rows based on all columns To find & select the duplicate all rows based on all columns call the Daraframe. duplicate() without any subset argument. It will return a Boolean series with True at the place of each duplicated rows except their first occurrence (default value of keep argument is 'first').

Can pandas duplicate indexes?

Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.

How do you create a multilevel index in pandas?

Creating a MultiIndex (hierarchical index) object A MultiIndex can be created from a list of arrays (using MultiIndex. from_arrays() ), an array of tuples (using MultiIndex. from_tuples() ), a crossed set of iterables (using MultiIndex. from_product() ), or a DataFrame (using MultiIndex.


2 Answers

It is a precedence operator issue.

You should add extra parenthesis to make your multi condition test working:

d[(d['x']>2) & (d['y']>7)]

This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.

like image 79
Zeugma Avatar answered Oct 28 '22 15:10

Zeugma


There may still be a better way, but

In [56]: d[d['x'] > 2] and d[d['y'] > 7]
Out[56]: 
   x  y
4  5  8

works.

like image 1
TomAugspurger Avatar answered Oct 28 '22 15:10

TomAugspurger