I have the following problem: I have two pandas data frames of different length containing some rows and columns that have common values and some that are different, like this:
df1: df2: Column1 Column2 Column3 ColumnA ColumnB ColumnC 0 a x x 0 c y y 1 c x x 1 e z z 2 e x x 2 a s s 3 d x x 3 d f f 4 h x x 5 k x x
What I want to do now is merging the two dataframes so that if ColumnA and Column1 have the same value the rows from df2 are appended to the corresponding row in df1, like this:
df1: Column1 Column2 Column3 ColumnB ColumnC 0 a x x s s 1 c x x y y 2 e x x z z 3 d x x f f 4 h x x NaN NaN 5 k x x NaN NaN
I know that the merge is doable through
df1.merge(df2,left_on='Column1', right_on='ColumnA')
but this command drops all rows that are not the same in Column1 and ColumnA in both files. Instead of that I want to keep these rows in df1 and just assign NaN to them in the columns where other rows have a value from df2, as shown above. Is there a smooth way to do this in pandas?
Thanks in advance!
It can be done using the merge() method. Below are some examples that depict how to merge data frames of different lengths using the above method: Example 1: Below is a program to merge two student data frames of different lengths.
If you want to combine two data frames with one column of different data types and different length in one data frame, you can use cbind but make sure the length of column in both df is same by padding NA to end of shorter data frame.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.
You can read the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
What you are looking for is a left join. The default option is an inner join. You can change this behavior by passing a different how argument:
df1.merge(df2,how='left', left_on='Column1', right_on='ColumnA')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With