How do I do a SQL style disjoint or set difference on two Pandas DataFrame objects?

Tags:

I'm trying to use Pandas to solve an issue courtesy of an idiot DBA not doing a backup of a now crashed data set, so I'm trying to find differences between two columns. For reasons I won't get into, I'm using Pandas rather than a database.

What I'd like to do is, given:

Dataset A = [A, B, C, D, E]  
Dataset B = [C, D, E, F]

I would like to find values which are disjoint.

Dataset A!=B = [A, B, F]

In SQL, this is standard set logic, accomplished differently depending on the dialect, but a standard function. How do I elegantly apply this in Pandas? I would love to input some code, but nothing I have is even remotely correct. It's a situation in which I don't know what I don't know..... Pandas has set logic for intersection and union, but nothing for disjoint/set difference.

Thanks!

648

asked Jan 18 '13 19:01

JPKab

1 Answers

You can use the set.symmetric_difference function:

In [1]: df1 = DataFrame(list('ABCDE'), columns=['x'])

In [2]: df1
Out[2]:
   x
0  A
1  B
2  C
3  D
4  E

In [3]: df2 = DataFrame(list('CDEF'), columns=['y'])

In [4]: df2
Out[4]:
   y
0  C
1  D
2  E
3  F

In [5]: set(df1.x).symmetric_difference(df2.y)
Out[5]: set(['A', 'B', 'F'])

answered Oct 18 '22 06:10

Zelazny7

Related questions
                            
                                Python scipy.optimize: Using fsolve with multiple first guesses
                            
                                object reuse in python doctest
                            
                                logging flask errors with mod_wsgi
                            
                                python os.fdopen(os.open()) can't be used for writing?
                            
                                App Engine Python Development Server + Taskqueue + Backend
                            
                                AWS glacier delete job
                            
                                Pycharm - How do I access the "Watches" pane?
                            
                                IPython support on Emacs 24.x
                            
                                PyInstaller packaged application works fine in Console mode, crashes in Window mode
                            
                                How do I get tomorrow's date in Python?
                            
                                Key Error 4 in Python
                            
                                Django Circular Model Dependency
                            
                                Replace First and Last Word of String in the Most Pythonic Way
                            
                                Django session race condition?
                            
                                Read Celery configuration from Python properties file
                            
                                recv() in Python
                            
                                How can i write my custom link extractor in scrapy python
                            
                                Fabric Sudo No Password Solution
                            
                                How to find mtu value of network through code(in python)?
                            
                                Is there anything like Python export?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I do a SQL style disjoint or set difference on two Pandas DataFrame objects?

Tags:

python

pandas

JPKab

People also ask

1 Answers

Zelazny7

Recent Activity

Donate For Us