I have a question for pandas dataframe operation
Suppose I have two different sized dataframe( they have same row count but don't have same size of columns
a =pd.DataFrame({"code1":['A','B','C','D'],"code2":['E','F','G','H']})
b= pd.DataFrame({"code1":['A1','B','C','D'],"code2":['E','F','G','N'],"code3":['A2','L','M','']})
For visualization:
a: code1 code2
0 A E
1 B F
2 C G
3 D H
b: code1 code2 code3
0 A1 E A2
1 B F L
2 C G M
3 D N
My ideal output is to have a dataframe 'c' saying that:
c: addedword deletedword
0 A1,A2 A
1 L
2 M
3 N H
Basically, I want to compare every row in 'a' with corresponding row in 'b'. And then compare every element so that if there is added string or deleted string, then display to a new dataframe.
Use set differences
g = lambda x: map(set, x.values) # converts 2-D array to sets
f = lambda t: (t[1] - t[0], t[0] - t[1]) # t will be a tuple of sets
h = lambda y: map(','.join, y) # stitch sets back together
pd.DataFrame(
list(map(h, map(f, zip(*map(g, (a, b)))))),
columns=['Added', 'Deleted']
)
Added Deleted
0 A1,A2 A
1 L
2 M
3 ,N H
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With