Ok this seems like it should be easy to do with merge or concatenate operations but I can't crack it. I'm working in pandas. I have two dataframes with duplicate rows in between them and I want to combine them in a manner where no rows or columns are duplicated. It would work like this <pre class="prettyprint"><code>df1: A B a 1 b 2 c 3 df2: A B b 2 c 3 d 4 df3 = df1 combined with df2 A B a 1 b 2 c 3 d 4 </code></pre> Some methods I've tried are to select the rows that are in one but not the other (an XOR) and then append them, but I can't figure out how to do the selection. The other idea I have is to append them and them delete duplicate rows, but I don't know how to do the latter.

You want an <code>outer</code> <code>merge</code>: <pre class="prettyprint"><code>In [103]: df1.merge(df2, how='outer') Out[103]: A B 0 a 1 1 b 2 2 c 3 3 d 4 </code></pre> The above works as it naturally finds common columns between both dfs and specifying the merge type results in a df with a union of the combined columns as desired.

You can use the following to drop the duplicates: <pre class="prettyprint"><code>pd.concat([df1, df2]).drop_duplicates() </code></pre>

Pandas/Python Combine two data frames with duplicate rows

Tags:

python

pandas

Ok this seems like it should be easy to do with merge or concatenate operations but I can't crack it. I'm working in pandas.

I have two dataframes with duplicate rows in between them and I want to combine them in a manner where no rows or columns are duplicated. It would work like this

df1:

A B 
a 1
b 2
c 3

df2:

A B 
b 2
c 3
d 4

df3 = df1 combined with df2

A B 
a 1
b 2
c 3
d 4

Some methods I've tried are to select the rows that are in one but not the other (an XOR) and then append them, but I can't figure out how to do the selection. The other idea I have is to append them and them delete duplicate rows, but I don't know how to do the latter.

426

asked Jun 18 '15 09:06

Elliott Miller

2 Answers

You want an outer merge:

In [103]:
df1.merge(df2, how='outer')

Out[103]:
   A  B
0  a  1
1  b  2
2  c  3
3  d  4

The above works as it naturally finds common columns between both dfs and specifying the merge type results in a df with a union of the combined columns as desired.

191

answered Sep 28 '22 01:09

EdChum

You can use the following to drop the duplicates:

pd.concat([df1, df2]).drop_duplicates()

answered Sep 28 '22 03:09

Manas Jani

Related questions
                            
                                How to get around Scrapy failed responses (status code 416, 999, ...)
                            
                                Unable to modify file name when saving matplotlib figures in Mac system
                            
                                Generating new SQLite database django
                            
                                How to get current cursor position for Text widget
                            
                                django 1.8 SESSION_EXPIRE_AT_BROWSER_CLOSE not working
                            
                                How to get accurate idft result from opencv?
                            
                                How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?
                            
                                adding an image to the Turtle Screen
                            
                                How to check in python if some class (by string name) exists?
                            
                                Pandas Time Series Holiday Rule Offset
                            
                                Why is my Sieve of Eratosthenes so slow?
                            
                                Extract values from a list using an array with boolean expressions
                            
                                Is there any function like iconv in Python?
                            
                                Correct way of loading JSON from file into a Python dictionary
                            
                                Hausdorff distance between 3D grids
                            
                                Script with scipy using py2exe
                            
                                Python Scapy vs dpkt
                            
                                How to make a scrolling menu in python-curses
                            
                                How to add capital to django-cities-light country model?
                            
                                Using subprocess.check_output for a command with 2>/dev/null

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With