I have this cunning piece of code that i'm using on the following dataset
df = pd.DataFrame({
'contact_email': ['[email protected]', '[email protected]', '[email protected]'],
'interest': ['Math', 'Science', 'Science']
})
print(df)
interest contact_email
0 Math [email protected]
1 Science [email protected]
2 Science [email protected]
df = df.groupby('Contact_Email').agg({'interest' : ' '.join}).reset_index()
print(df)
contact_email AOI
0 [email protected] Math Science Science
this is so close to what I wanted, but i need to only return unique interest. (I have users/customers entering the same form, with the same values almost 10 times!)
also, as a nice to have does anyone know how to remove the 0,1,2,3 index.
Thanks!
Use unique
for remove duplicates:
df = (df.groupby('contact_email')
.agg({'interest' : lambda x: ' '.join(x.unique())})
.reset_index())
print(df)
contact_email interest
0 [email protected] Math Science
Or set
s, but order of values should be changed:
df = df.groupby('contact_email').agg({'interest' : lambda x: ' '.join(set(x))}).reset_index()
print(df)
contact_email interest
0 [email protected] Math Science
Or drop_duplicates
:
df = (df.drop_duplicates(subset=['contact_email','interest'])
.groupby('contact_email')
.agg({'interest' : ' '.join})
.reset_index())
print(df)
contact_email interest
0 [email protected] Math Science
Since you have only one function, you can use groupby
+ apply
and utilize set
:
res = df.groupby('contact_email')['interest']\
.apply(set).apply(' '.join)\
.reset_index()
print(res)
contact_email interest
0 [email protected] Math Science
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With