Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping one column using agg & join but only on unique values

I have this cunning piece of code that i'm using on the following dataset

    df = pd.DataFrame({
    'contact_email': ['[email protected]', '[email protected]', '[email protected]'], 
    'interest': ['Math', 'Science', 'Science']
})
    print(df)
    interest contact_email
0   Math    [email protected]
1   Science [email protected]
2   Science [email protected]

df = df.groupby('Contact_Email').agg({'interest' : ' '.join}).reset_index()
print(df)

        contact_email   AOI
0   [email protected]   Math Science Science

this is so close to what I wanted, but i need to only return unique interest. (I have users/customers entering the same form, with the same values almost 10 times!)

also, as a nice to have does anyone know how to remove the 0,1,2,3 index.

Thanks!

like image 810
Umar.H Avatar asked Dec 23 '22 05:12

Umar.H


2 Answers

Use unique for remove duplicates:

df = (df.groupby('contact_email')
        .agg({'interest' : lambda x: ' '.join(x.unique())})
        .reset_index())
print(df)
   contact_email      interest
0  [email protected]  Math Science

Or sets, but order of values should be changed:

df = df.groupby('contact_email').agg({'interest' : lambda x: ' '.join(set(x))}).reset_index()
print(df)
   contact_email      interest
0  [email protected]  Math Science

Or drop_duplicates:

df = (df.drop_duplicates(subset=['contact_email','interest'])
       .groupby('contact_email')
       .agg({'interest' : ' '.join})
       .reset_index())
print(df)
   contact_email      interest
0  [email protected]  Math Science
like image 143
jezrael Avatar answered Dec 25 '22 19:12

jezrael


Since you have only one function, you can use groupby + apply and utilize set:

res = df.groupby('contact_email')['interest']\
        .apply(set).apply(' '.join)\
        .reset_index()

print(res)

   contact_email      interest
0  [email protected]  Math Science
like image 27
jpp Avatar answered Dec 25 '22 19:12

jpp