I have a basic Python questions.
I have a pandas dataframe like this:
ID | Name | User_id
---+------+--------
1 John 10
2 Tom 11
3 Sam 12
4 Ben 13
5 Jen 10
6 Tim 11
7 Sean 14
8 Ana 15
9 Sam 12
10 Ben 13
I want to get the names and user ids that share the same value for User_id, without returning names that appear twice. So I would like the output to look something like this:
John Jen 10
Tom Tim 11
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
duplicated() In Python's Pandas library, Dataframe class provides a member function to find duplicate rows based on all columns or some specific columns i.e. It returns a Boolean Series with True value for each duplicated row.
Pandas Series: equals() function The equals() function is used to test whether two Pandas objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
IIUC you could do it this way, groupby
on 'User_id' and then filter the groupby:
In [54]:
group = df.groupby('User_id')['Name'].unique()
In [55]:
group[group.apply(lambda x: len(x)>1)]
Out[55]:
User_id
10 [John, Jen]
11 [Tom, Tim]
Name: Name, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With