Ok this seems like it should be easy to do with merge or concatenate operations but I can't crack it. I'm working in pandas.
I have two dataframes with duplicate rows in between them and I want to combine them in a manner where no rows or columns are duplicated. It would work like this
df1:
A B
a 1
b 2
c 3
df2:
A B
b 2
c 3
d 4
df3 = df1 combined with df2
A B
a 1
b 2
c 3
d 4
Some methods I've tried are to select the rows that are in one but not the other (an XOR) and then append them, but I can't figure out how to do the selection. The other idea I have is to append them and them delete duplicate rows, but I don't know how to do the latter.
The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
In this approach to prevent duplicated columns from joining the two data frames, the user needs simply needs to use the pd. merge() function and pass its parameters as they join it using the inner join and the column names that are to be joined on from left and right data frames in python.
You want an outer
merge
:
In [103]:
df1.merge(df2, how='outer')
Out[103]:
A B
0 a 1
1 b 2
2 c 3
3 d 4
The above works as it naturally finds common columns between both dfs and specifying the merge type results in a df with a union of the combined columns as desired.
You can use the following to drop the duplicates:
pd.concat([df1, df2]).drop_duplicates()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With