Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?

Tags:

One situation I sometimes encounter is, I have two dataframes (df1, df2) and I want to create a new dataframe (df3) based on the intersection of multiple columns between df1 and df2.

For example, I want to create df3 by filtering df1 by columns Campaign and Group.

Click to copy

import pandas as pd
df1 = pd.DataFrame({'Campaign':['Campaign 1', 'Campaign 2', 'Campaign 3', 'Campaign 3', 'Campaign 4'], 'Group':['Some group', 'Arbitrary Group', 'Group 1', 'Group 2', 'Done Group'], 'Metric':[245,91,292,373,32]}, columns=['Campaign', 'Group', 'Metric'])
df2 = pd.DataFrame({'Campaign':['Campaign 3', 'Campaign 3'], 'Group':['Group 1', 'Group 2'], 'Metric':[23, 456]}, columns=['Campaign', 'Group', 'Metric'])

df1

Click to copy

     Campaign            Group  Metric
0  Campaign 1       Some group     245
1  Campaign 2  Arbitrary Group      91
2  Campaign 3          Group 1     292
3  Campaign 3          Group 2     373
4  Campaign 4       Done Group      32

df2

Click to copy

     Campaign    Group  Metric
0  Campaign 3  Group 1      23
1  Campaign 3  Group 2     456

I know I can do this with merge...

Click to copy

df3 = df1.merge(df2, how='inner', on=['Campaign', 'Group'], suffixes=('','_del'))
#df3
     Campaign    Group  Metric  Metric_del
0  Campaign 3  Group 1     292          23
1  Campaign 3  Group 2     373         456

but then I have to figure out how to drop columns that end with _del. I guess this:

Click to copy

df3.select(lambda x: not re.search('_del', x), axis=1)
##The result I'm going for but required merge, then select (2-steps)
     Campaign    Group  Metric
0  Campaign 3  Group 1     292
1  Campaign 3  Group 2     373

Questions

What I'm mainly interested in is returning df1 that's simply filtered on df2's Campaign|Group values.

Is there a better way to return df1 without resorting to merge?
Is there a way to merge but NOT return df2's columns to the merge and returning only df1's columns?

298

asked Aug 10 '15 17:08

Jarad

1 Answers

Assuming that your df1 and df2 have exactly the same columns. You can first set those join-key columns as index and use df1.reindex(df2.index) and a further .dropna() to produce the intersection.

Click to copy

df3 = df1.set_index(['Campaign', 'Group'])
df4 = df2.set_index(['Campaign', 'Group'])
# reindex first and dropna will produce the intersection
df3.reindex(df4.index).dropna(how='all').reset_index()

     Campaign    Group  Metric
0  Campaign 3  Group 1     292
1  Campaign 3  Group 2     373

Edit:

Use .isin when key is not unique.

Click to copy

# create some duplicated keys and values
df3 = df3.append(df3)
df4 = df4.append(df4)

# isin
df3[df3.index.isin(df4.index)].reset_index()

     Campaign    Group  Metric
0  Campaign 3  Group 1     292
1  Campaign 3  Group 2     373
2  Campaign 3  Group 1     292
3  Campaign 3  Group 2     373

196

answered Sep 23 '22 00:09

Jianxun Li

Related questions
                            
                                Customize Maya's addCheckCallback pop up message
                            
                                Matplotlib alternative for 3D scatter plots
                            
                                Python String Double Splitting?
                            
                                Replace Nulls in DataFrame with Max in Row
                            
                                How to check if writing to a file fails in Python
                            
                                df.loc filtering doesn't work with None values
                            
                                TypeError: Argument given by name ('k') and position (2)
                            
                                Pandas: Change values in multiple columns according to boolean condition
                            
                                How to read all HTTP headers in Python CGI script?
                            
                                Proper use of serialization with psycopg2
                            
                                Fastest way to sorting a corpus dictionary into an OrderedDict - python
                            
                                Selenium CSS selector for nth occurrence of td span:nth-child(2)
                            
                                TypeError: __init__() takes from 1 to 3 positional arguments but 4 were given
                            
                                Animate graph diffusion with NetworkX
                            
                                Use ein (emacs ipython notebook) on remote server
                            
                                How to get the first #include statement in C++ files using Python regex?
                            
                                Why does this asyncio.Task never finish cancelling?
                            
                                Have IPython run using Python 3 and not Python 2
                            
                                How to deploy a custom docker image on Elastic Beanstalk?
                            
                                Trying to create a dialog in another thread wxpython

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?

Tags:

python

merge

pandas

Jarad

People also ask

1 Answers

Edit:

Jianxun Li

Recent Activity

Donate For Us