Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows that match values in multiple columns in pandas

Tags:

python

pandas

How do I select rows that match values in multiple columns?

For example, we have the following df

k1 | k2 | v1 | v2
1  | 2  | 3  | 4
1  | 5  | 5  | 6
1  | 8  | 8  | 9

I am trying to select the middle row:

key_names = ["k1", "k2"]
keys = [1, 5]
selected_rows = df.loc[df[key_names].isin(keys)]

I get the following error:

ValueError: Cannot index with multidimensional key

The expected output is :

1  | 5  | 5  | 6

Thanks

like image 646
YohanRoth Avatar asked Jul 25 '19 19:07

YohanRoth


1 Answers

TLDR

Use one of the following, based on your requirements:

df[(df[key_names] == keys).all(1)]

df[df[key_names].isin(keys).all(1)]

You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.

>>> df[key_names].isin(keys)
     k1     k2
0  True  False
1  True   True
2  True  False

You are only interested in rows where all values, are True, and so you can reduce the dimension using all across the first axis.

>>> df[key_names].isin(keys).all(1)
0    False
1     True
2    False
dtype: bool

The one caveat here is that isin is not order dependent, so you would get the same results using another ordering of your values.

>>> df[key_names].isin([5, 1]).all(1)
0    False
1     True
2    False
dtype: bool

If you only want an exact ordering match, use == for broadcasted comparison, instead of isin

>>> (df[key_names] == keys).all(1)
0    False
1     True
2    False
dtype: bool

>>> (df[key_names] == [5, 1]).all(1)
0    False
1    False
2    False
dtype: bool

The last step here is using the 1D mask you've created to index the original DataFrame:

>>> df[(df[key_names] == keys).all(1)]
   k1  k2  v1  v2
1   1   5   5   6
like image 83
user3483203 Avatar answered Oct 20 '22 22:10

user3483203