I have a basic Python questions.
I have a pandas dataframe like this:
ID | Name | User_id
---+------+--------
 1   John     10
 2   Tom      11  
 3   Sam      12
 4   Ben      13
 5   Jen      10
 6   Tim      11
 7   Sean     14
 8   Ana      15
 9   Sam      12
 10  Ben      13
I want to get the names and user ids that share the same value for User_id, without returning names that appear twice. So I would like the output to look something like this:
John Jen 10
Tom Tim 11
                The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
duplicated() In Python's Pandas library, Dataframe class provides a member function to find duplicate rows based on all columns or some specific columns i.e. It returns a Boolean Series with True value for each duplicated row.
Pandas Series: equals() function The equals() function is used to test whether two Pandas objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
IIUC you could do it this way, groupby on 'User_id' and then filter the groupby:
In [54]:
group = df.groupby('User_id')['Name'].unique()
In [55]:
group[group.apply(lambda x: len(x)>1)]
Out[55]:
User_id
10    [John, Jen]
11     [Tom, Tim]
Name: Name, dtype: object
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With