Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Column differences, containing lists

I have a data frame where the columns values are list and want to find the differences between two columns, or in other words I want to find all the elements in column A which is not there in column B.

data={'NAME':['JOHN','MARY','CHARLIE'],
  'A':[[1,2,3],[2,3,4],[3,4,5]],
  'B':[[2,3,4],[3,4,5],[4,5,6]]}
df=pd.DataFrame(data)
df=df[['NAME','A','B']]

#I'm able to concatenate
df['C']=df['A']+df['B']

    NAME    A   B   C
  0 JOHN    [1, 2, 3]   [2, 3, 4]   [1, 2, 3, 2, 3, 4]
  1 MARY    [2, 3, 4]   [3, 4, 5]   [2, 3, 4, 3, 4, 5]
  2 CHARLIE [3, 4, 5]   [4, 5, 6]   [3, 4, 5, 4, 5, 6]

Any way to find the differences?

df['C']=df['A']-df['B']

I know we can use df.apply to a function but row by row processing will run slow since I have around 400K rows. I'm looking for a straight forward method like

df['C']=df['A']+df['B']
like image 976
Prasun Velayudhan Avatar asked May 08 '26 09:05

Prasun Velayudhan


1 Answers

For a set difference,

df['A'].map(set) - df['B'].map(set)
like image 116
chrisb Avatar answered May 10 '26 22:05

chrisb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!