Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do "(df1 & not df2)" dataframe merge in pandas?

I have 2 pandas dataframes df1 & df2 with common columns/keys (x,y).

I want to merge do a "(df1 & not df2)" kind of merge on keys (x,y), meaning I want my code to return a dataframe containing rows with (x,y) only in df1 & not in df2.

SAS has an equivalent functionality

data final;
merge df1(in=a) df2(in=b);
by x y;
if a & not b;
run;

Who to replicate the same functionality in pandas elegantly? It would have been great if we can specify how="left-right" in merge().

like image 355
GeorgeOfTheRF Avatar asked Sep 20 '15 05:09

GeorgeOfTheRF


People also ask

How do I merge two data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

How do I combine two DF with different columns?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

How do I drop a row in a data frame?

To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.

How do you append to a data frame?

Dataframe append syntax Using the append method on a dataframe is very simple. You type the name of the first dataframe, and then . append() to call the method. Then inside the parenthesis, you type the name of the second dataframe, which you want to append to the end of the first.


1 Answers

I just upgraded to version 0.17.0 RC1 which was released 10 days ago. Just found out that pd.merge() have new argument in this new release called indicator=True to acheive this in pandonic way!!

df=pd.merge(df1,df2,on=['x','y'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']

indicator: Add a column to the output DataFrame called _merge with information on the source of each row. _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame, right_only for observations whose merge key only appears in 'right' DataFrame, and both if the observation’s merge key is found in both.

http://pandas-docs.github.io/pandas-docs-travis/merging.html#database-style-dataframe-joining-merging

like image 114
GeorgeOfTheRF Avatar answered Oct 29 '22 05:10

GeorgeOfTheRF