How do I select rows that match values in multiple columns?
For example, we have the following df
k1 | k2 | v1 | v2
1  | 2  | 3  | 4
1  | 5  | 5  | 6
1  | 8  | 8  | 9
I am trying to select the middle row:
key_names = ["k1", "k2"]
keys = [1, 5]
selected_rows = df.loc[df[key_names].isin(keys)]
I get the following error:
ValueError: Cannot index with multidimensional key
The expected output is :
1  | 5  | 5  | 6
Thanks
TLDR
Use one of the following, based on your requirements:
df[(df[key_names] == keys).all(1)]
df[df[key_names].isin(keys).all(1)]
You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.
>>> df[key_names].isin(keys)
     k1     k2
0  True  False
1  True   True
2  True  False
You are only interested in rows where all values, are True, and so you can reduce the dimension using all across the first axis.
>>> df[key_names].isin(keys).all(1)
0    False
1     True
2    False
dtype: bool
The one caveat here is that isin is not order dependent, so you would get the same results using another ordering of your values.
>>> df[key_names].isin([5, 1]).all(1)
0    False
1     True
2    False
dtype: bool
If you only want an exact ordering match, use == for broadcasted comparison, instead of isin
>>> (df[key_names] == keys).all(1)
0    False
1     True
2    False
dtype: bool
>>> (df[key_names] == [5, 1]).all(1)
0    False
1    False
2    False
dtype: bool
The last step here is using the 1D mask you've created to index the original DataFrame:
>>> df[(df[key_names] == keys).all(1)]
   k1  k2  v1  v2
1   1   5   5   6
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With