I have two Python Pandas dataframes A, B, with the same columns (obviously with different data). I want to check A is a subset of B, that is, all rows of A are contained in B.
Any idea how to do it?
Method DataFrame.merge(another_DF)
merges on the intersection of the columns by default (uses all columns with same names from both DFs) and uses how='inner'
- so we expect to have the same # of rows after inner join
(if neither of DFs has duplicates):
len(A.merge(B)) == len(A)
PS it will NOT work properly if one of DFs have duplicated rows - see below for such cases
Demo:
In [128]: A
Out[128]:
A B C
0 1 2 3
1 4 5 6
In [129]: B
Out[129]:
A B C
0 4 5 6
1 1 2 3
2 9 8 7
In [130]: len(A.merge(B)) == len(A)
Out[130]: True
for data sets containing duplicates, we can remove duplicates and use the same method:
In [136]: A
Out[136]:
A B C
0 1 2 3
1 4 5 6
2 1 2 3
In [137]: B
Out[137]:
A B C
0 4 5 6
1 1 2 3
2 9 8 7
3 4 5 6
In [138]: A.merge(B).drop_duplicates()
Out[138]:
A B C
0 1 2 3
2 4 5 6
In [139]: len(A.merge(B).drop_duplicates()) == len(A.drop_duplicates())
Out[139]: True
You also can try:
ex = pd.DataFrame({"col1": ["banana", "tomato", "apple"],
"col2": ["cat", "dog", "kangoo"],
"col3": ["tv", "phone", "ps4"]})
ex2 = ex.iloc[0:2]
ex2.isin(ex).all().all()
It returns True
If you try to switch some values such as tv
and phone
you get a False
value
ex2 = pd.DataFrame({"col1": ["banana", "tomato"],
"col2": ["cat", "dog"],
"col3": ["phone", "tv"]})
ex2.isin(ex).all().all()
>> False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With