I have a df such as:
df=pd.DataFrame.from_items([('i', [set([1,2,3,4]), set([1,2,3,4]), set([1,2,3,4]),set([1,2,3,4])]), ('j', [set([2,3]), set([1]), set([4]),set([3,4])])])
so it looks like
>>> df
i j
0 {1, 2, 3, 4} {2, 3}
1 {1, 2, 3, 4} {1}
2 {1, 2, 3, 4} {4}
3 {1, 2, 3, 4} {3, 4}
I would like to compute df.i.intersection(df.j) and assign that to be column k. That is, I want this:
df['k']=[df.i.iloc[t].intersection(df.j.iloc[t]) for t in range(4)]
>>> df.k
0 {2, 3}
1 {1}
2 {4}
3 {3, 4}
Name: k, dtype: object
Is there a df.apply() for this? The actual df is millions of rows.
Working with set
s, list
s and dict
s in pandas
is a bit problematic, because best working with scalars:
df['k'] = [x[0] & x[1] for x in zip(df['i'], df['j'])]
print (df)
i j k
0 {1, 2, 3, 4} {2, 3} {2, 3}
1 {1, 2, 3, 4} {1} {1}
2 {1, 2, 3, 4} {4} {4}
3 {1, 2, 3, 4} {3, 4} {3, 4}
df['k'] = [x[0].intersection(x[1]) for x in zip(df['i'], df['j'])]
print (df)
i j k
0 {1, 2, 3, 4} {2, 3} {2, 3}
1 {1, 2, 3, 4} {1} {1}
2 {1, 2, 3, 4} {4} {4}
3 {1, 2, 3, 4} {3, 4} {3, 4}
Solution with apply
:
df['k'] = df.apply(lambda x: x['i'].intersection(x['j']), axis=1)
print (df)
i j k
0 {1, 2, 3, 4} {2, 3} {2, 3}
1 {1, 2, 3, 4} {1} {1}
2 {1, 2, 3, 4} {4} {4}
3 {1, 2, 3, 4} {3, 4} {3, 4}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With