I have this cunning piece of code that i'm using on the following dataset
    df = pd.DataFrame({
    'contact_email': ['[email protected]', '[email protected]', '[email protected]'], 
    'interest': ['Math', 'Science', 'Science']
})
    print(df)
    interest contact_email
0   Math    [email protected]
1   Science [email protected]
2   Science [email protected]
df = df.groupby('Contact_Email').agg({'interest' : ' '.join}).reset_index()
print(df)
        contact_email   AOI
0   [email protected]   Math Science Science
this is so close to what I wanted, but i need to only return unique interest. (I have users/customers entering the same form, with the same values almost 10 times!)
also, as a nice to have does anyone know how to remove the 0,1,2,3 index.
Thanks!
Use unique for remove duplicates:
df = (df.groupby('contact_email')
        .agg({'interest' : lambda x: ' '.join(x.unique())})
        .reset_index())
print(df)
   contact_email      interest
0  [email protected]  Math Science
Or sets, but order of values should be changed:
df = df.groupby('contact_email').agg({'interest' : lambda x: ' '.join(set(x))}).reset_index()
print(df)
   contact_email      interest
0  [email protected]  Math Science
Or drop_duplicates:
df = (df.drop_duplicates(subset=['contact_email','interest'])
       .groupby('contact_email')
       .agg({'interest' : ' '.join})
       .reset_index())
print(df)
   contact_email      interest
0  [email protected]  Math Science
                        Since you have only one function, you can use groupby + apply and utilize set:
res = df.groupby('contact_email')['interest']\
        .apply(set).apply(' '.join)\
        .reset_index()
print(res)
   contact_email      interest
0  [email protected]  Math Science
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With