I have two dataframes of similar format:
df1 = DataFrame({'a':[0,1,2,3,4], 'b':['q','r','s','t','u']})
df1
a b
0 0 q
1 1 r
2 2 s
3 3 t
4 4 u
df2 = DataFrame({'a':[4,3,2,1,999], 'b':['u','r','s','t','u']})
df2
a b
0 4 u
1 3 r
2 2 s
3 1 t
4 999 u
I would like to get a new dataframe that has rows that appear in both of these (ignoring the index). So the above example gives a dataframe
a b
0 4 u
1 2 s
How do I get this intersection?
Intersection of Two data frames in Pandas can be easily calculated by using the pre-defined function merge() . This function takes both the data frames as argument and returns the intersection between them.
Intersect of two dataframe in pyspark can be accomplished using intersect() function. Intersection in Pyspark returns the common rows of two or more dataframe. Intersect removes the duplicate after combining. Intersect all returns the common rows from the dataframe with duplicate.
Import the Pandas and NumPy modules. Create 2 Pandas Series . Find the union of the series using the union1d() method. Find the intersection of the series using the intersect1d() method.
You can just perform a merge
, this will use all columns and the default type of merge is inner
so values must be present in both dfs:
In [71]:
df1.merge(df2)
Out[71]:
a b
0 2 s
1 4 u
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With