I want to find the values of col1
and col2
where the col1
and col2
of the first dataframe are both in the second dataframe.
These rows should be in the result dataframe:
pizza, boy
pizza, girl
ice cream, boy
because all three rows are in the first and second dataframes.
How do I possibly accomplish this? I was thinking of using isin
, but I am not sure how to use it when I have to consider more than one column.
pandas MultiIndex Key Points –You can have Multi-level for both Index and Column labels. Multi-level columns are used when you wanted to group columns together.
Pandas DataFrame isin() Method The isin() method checks if the Dataframe contains the specified value(s). It returns a DataFrame similar to the original DataFrame, but the original values have been replaced with True if the value was one of the specified values, otherwise False .
isin() function exists in both pandas DataFrame & Series which is used to check if the object contains the elements from list, Series, Dict. It returns same as caller object of booleans indicating if each row cell/element is in values.
Perform an inner merge on col1
and col2
:
import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
print(pd.merge(df2.reset_index(), df1, how='inner').set_index('index'))
yields
col1 col2
index
10 pizza boy
11 pizza girl
16 ice cream boy
The purpose of the reset_index
and set_index
calls are to preserve df2
's index as in the desired result you posted. If the index is not important, then
pd.merge(df2, df1, how='inner')
# col1 col2
# 0 pizza boy
# 1 pizza girl
# 2 ice cream boy
would suffice.
Alternatively, you could construct MultiIndex
s out of the col1
and col2
columns, and then call the MultiIndex.isin
method:
index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['col1', 'col2']])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['col1', 'col2']])
print(df2.loc[index2.isin(index1)])
yields
col1 col2
10 pizza boy
11 pizza girl
16 ice cream boy
Thank you unutbu! Here is a little update.
import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
df1[df1.set_index(['col1','col2']).index.isin(df2.set_index(['col1','col2']).index)]
return:
col1 col2
1 pizza boy
4 pizza girl
5 ice cream boy
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With