Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use pandas isin for multiple columns

Tags:

python

pandas

enter image description here

enter image description here

enter image description here

I want to find the values of col1 and col2 where the col1 and col2 of the first dataframe are both in the second dataframe.

These rows should be in the result dataframe:

  1. pizza, boy

  2. pizza, girl

  3. ice cream, boy

because all three rows are in the first and second dataframes.

How do I possibly accomplish this? I was thinking of using isin, but I am not sure how to use it when I have to consider more than one column.

like image 528
Jun Jang Avatar asked Jul 19 '17 18:07

Jun Jang


People also ask

Can a pandas index contain multiple columns?

pandas MultiIndex Key Points –You can have Multi-level for both Index and Column labels. Multi-level columns are used when you wanted to group columns together.

How does ISIN work in pandas?

Pandas DataFrame isin() Method The isin() method checks if the Dataframe contains the specified value(s). It returns a DataFrame similar to the original DataFrame, but the original values have been replaced with True if the value was one of the specified values, otherwise False .

Does ISIN work on lists?

isin() function exists in both pandas DataFrame & Series which is used to check if the object contains the elements from list, Series, Dict. It returns same as caller object of booleans indicating if each row cell/element is in values.


2 Answers

Perform an inner merge on col1 and col2:

import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))

print(pd.merge(df2.reset_index(), df1, how='inner').set_index('index'))

yields

            col1  col2
index                 
10         pizza   boy
11         pizza  girl
16     ice cream   boy

The purpose of the reset_index and set_index calls are to preserve df2's index as in the desired result you posted. If the index is not important, then

pd.merge(df2, df1, how='inner')
#         col1  col2
# 0      pizza   boy
# 1      pizza  girl
# 2  ice cream   boy

would suffice.


Alternatively, you could construct MultiIndexs out of the col1 and col2 columns, and then call the MultiIndex.isin method:

index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['col1', 'col2']])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['col1', 'col2']])
print(df2.loc[index2.isin(index1)])

yields

         col1  col2
10      pizza   boy
11      pizza  girl
16  ice cream   boy
like image 198
unutbu Avatar answered Sep 23 '22 05:09

unutbu


Thank you unutbu! Here is a little update.

import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
df1[df1.set_index(['col1','col2']).index.isin(df2.set_index(['col1','col2']).index)]

return:

    col1    col2
1   pizza   boy
4   pizza   girl
5   ice cream   boy
like image 39
Ningrong Ye Avatar answered Sep 25 '22 05:09

Ningrong Ye