Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two python pandas data frames of different length but keep all rows in output data frame

I have the following problem: I have two pandas data frames of different length containing some rows and columns that have common values and some that are different, like this:

df1:                                 df2:        Column1  Column2  Column3           ColumnA  ColumnB ColumnC     0    a        x        x            0    c        y       y     1    c        x        x            1    e        z       z     2    e        x        x            2    a        s       s     3    d        x        x            3    d        f       f     4    h        x        x     5    k        x        x             

What I want to do now is merging the two dataframes so that if ColumnA and Column1 have the same value the rows from df2 are appended to the corresponding row in df1, like this:

df1:     Column1  Column2  Column3  ColumnB  ColumnC   0    a        x        x        s        s   1    c        x        x        y        y   2    e        x        x        z        z   3    d        x        x        f        f   4    h        x        x        NaN      NaN   5    k        x        x        NaN      NaN 

I know that the merge is doable through

df1.merge(df2,left_on='Column1', right_on='ColumnA') 

but this command drops all rows that are not the same in Column1 and ColumnA in both files. Instead of that I want to keep these rows in df1 and just assign NaN to them in the columns where other rows have a value from df2, as shown above. Is there a smooth way to do this in pandas?

Thanks in advance!

like image 373
sequence_hard Avatar asked Oct 12 '15 17:10

sequence_hard


People also ask

How do I merge two DataFrames with different lengths in pandas?

It can be done using the merge() method. Below are some examples that depict how to merge data frames of different lengths using the above method: Example 1: Below is a program to merge two student data frames of different lengths.

How do I merge two DataFrames with different dimensions?

If you want to combine two data frames with one column of different data types and different length in one data frame, you can use cbind but make sure the length of column in both df is same by padding NA to end of shorter data frame.

How do you concatenate two DataFrames with different columns in pandas?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

How do I merge two DataFrames in Python with different column names?

Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.


1 Answers

You can read the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

What you are looking for is a left join. The default option is an inner join. You can change this behavior by passing a different how argument:

df1.merge(df2,how='left', left_on='Column1', right_on='ColumnA') 
like image 136
Sina Avatar answered Oct 15 '22 03:10

Sina