How do I select rows that match values in multiple columns?
For example, we have the following df
k1 | k2 | v1 | v2
1 | 2 | 3 | 4
1 | 5 | 5 | 6
1 | 8 | 8 | 9
I am trying to select the middle row:
key_names = ["k1", "k2"]
keys = [1, 5]
selected_rows = df.loc[df[key_names].isin(keys)]
I get the following error:
ValueError: Cannot index with multidimensional key
The expected output is :
1 | 5 | 5 | 6
Thanks
TLDR
Use one of the following, based on your requirements:
df[(df[key_names] == keys).all(1)]
df[df[key_names].isin(keys).all(1)]
You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.
>>> df[key_names].isin(keys)
k1 k2
0 True False
1 True True
2 True False
You are only interested in rows where all values, are True
, and so you can reduce the dimension using all
across the first axis.
>>> df[key_names].isin(keys).all(1)
0 False
1 True
2 False
dtype: bool
The one caveat here is that isin
is not order dependent, so you would get the same results using another ordering of your values.
>>> df[key_names].isin([5, 1]).all(1)
0 False
1 True
2 False
dtype: bool
If you only want an exact ordering match, use ==
for broadcasted comparison, instead of isin
>>> (df[key_names] == keys).all(1)
0 False
1 True
2 False
dtype: bool
>>> (df[key_names] == [5, 1]).all(1)
0 False
1 False
2 False
dtype: bool
The last step here is using the 1D
mask you've created to index the original DataFrame:
>>> df[(df[key_names] == keys).all(1)]
k1 k2 v1 v2
1 1 5 5 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With