I'm trying to use Pandas to solve an issue courtesy of an idiot DBA not doing a backup of a now crashed data set, so I'm trying to find differences between two columns. For reasons I won't get into, I'm using Pandas rather than a database.
What I'd like to do is, given:
Dataset A = [A, B, C, D, E]
Dataset B = [C, D, E, F]
I would like to find values which are disjoint.
Dataset A!=B = [A, B, F]
In SQL, this is standard set logic, accomplished differently depending on the dialect, but a standard function. How do I elegantly apply this in Pandas? I would love to input some code, but nothing I have is even remotely correct. It's a situation in which I don't know what I don't know..... Pandas has set logic for intersection and union, but nothing for disjoint/set difference.
Thanks!
Overview. The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
It is possible to compare two pandas Series with help of Relational operators, we can easily compare the corresponding elements of two series at a time. The result will be displayed in form of True or False. And we can also use a function like Pandas Series. equals() to compare two pandas series.
Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns.
Pandas Series: diff() function The diff() function is used to first discrete difference of element. Calculates the difference of a Series element compared with another element in the Series (default is element in previous row). Periods to shift for calculating difference, accepts negative values.
You can use the set.symmetric_difference
function:
In [1]: df1 = DataFrame(list('ABCDE'), columns=['x'])
In [2]: df1
Out[2]:
x
0 A
1 B
2 C
3 D
4 E
In [3]: df2 = DataFrame(list('CDEF'), columns=['y'])
In [4]: df2
Out[4]:
y
0 C
1 D
2 E
3 F
In [5]: set(df1.x).symmetric_difference(df2.y)
Out[5]: set(['A', 'B', 'F'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With