Python - Delete duplicates in a dataframe based on two columns combinations?

Tags:

I have a dataframe with 3 columns in Python:

Name1 Name2 Value
Juan  Ale   1
Ale   Juan  1

and would like to eliminate the duplicates based on columns Name1 and Name2 combinations.

In my example both rows are equal (but they are in different order), and I would like to delete the second row and just keep the first one, so the end result should be:

Name1 Name2 Value
Juan  Ale   1

Any idea will be really appreciated!

692

asked Jul 05 '18 01:07

Juan

2 Answers

By using np.sort with duplicated

df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
Out[614]: 
  Name1 Name2  Value
1   Ale  Juan      1

Performance

df=pd.concat([df]*100000)

%timeit df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
10 loops, best of 3: 69.3 ms per loop
%timeit df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]
1 loop, best of 3: 3.72 s per loop

104

answered Sep 17 '22 12:09

BENY

You can convert to frozenset and use pd.DataFrame.duplicated.

res = df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]

print(res)

  Name1 Name2  Value
0  Juan   Ale      1

frozenset is necessary instead of set since duplicated uses hashing to check for duplicates.

Scales better with columns than rows. For a large number of rows, use @Wen's sort-based algorithm.

answered Sep 19 '22 12:09

jpp

Related questions
                            
                                Type 'NSNotification.Name' has no member 'UITextField'
                            
                                How to rename a column name in maria DB
                            
                                SDKApplicationDelegate Use of unresolved identifier
                            
                                Django Count and Sum annotations interfere with each other
                            
                                Why is Microsoft.CodeAnalysis published with ASP.NET Core website?
                            
                                "Navbar refers to a value, but is being used as a type here" when trying to render a shallow copy of my component when testing
                            
                                Storybook w/ react-router - You should not use <Link> outside <Router>
                            
                                Xcode 11 - "Couldn't communicate with a helper application." when attempting to add Swift Package
                            
                                How can I create a site in php and have it generate a static version?
                            
                                How to instantiate a Java array given an array type at runtime?
                            
                                How do you keep a personal wiki (TiddlyWiki) current and in sync in multiple locations? [closed]
                            
                                Cause of No suitable driver found for

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With