Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

What is a good pythonic way of finding duplicate objects?

Tags:

python

sorting

python-2.x

I frequently use sorted and groupby to find duplicates items in an iterable. Now I see it is unreliable:

from itertools import groupby
data = 3 * ('x ',  (1,), u'x')
duplicates = [k for k, g in groupby(sorted(data)) if len(list(g)) > 1]
print duplicates
# [] printed - no duplicates found - like 9 unique values

The reason why the code above fails in Python 2.x is explained here.

What is a reliable pythonic way of finding duplicates?

I looked for similar questions/answers on SO. The best of them is "In Python, how do I take a list and reduce it to a list of duplicates?", but the accepted solution is not pythonic (it is procedural multiline for ... if ... add ... else ... add ... return result) and other solutions are unreliable (depends on unfulfilled transitivity of "<" operator) or are slow (O n*n).

[EDIT] Closed. The accepted answer helped me to summarize conclusions in my answer below more general.

I like to use builtin types to represent e.g. tree structures. This is why I am afraid of mix now.

like image

601

asked Apr 20 '12 14:04

hynekcer

People also ask

How do you find duplicates in a data frame?

DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.

1 Answers

Note: Assumes entries are hashable

>>> from collections import Counter
>>> data = 3 * ('x ',  (1,), u'x')
>>> [k for k, c in Counter(data).iteritems() if c > 1]
[u'x', 'x ', (1,)]

like image

87

answered Nov 16 '22 02:11

jamylak

Sign in to Comment

Related questions
                            
                                What's try-else good for in Python?
                            
                                Why does the python datetime class have a 'fromtimestamp' method, but not a 'totimestamp' method?
                            
                                Is there anything I need aware of using Tkinter and pygame together?
                            
                                What is the proper way to handle Redis connection in Tornado ? (Async - Pub/Sub)
                            
                                Is there a module in Python that does something like "sqldf" for R?
                            
                                Function parameters - Python
                            
                                List of Dicts in Redis
                            
                                Nesting Python context managers
                            
                                Getting a raw, unparsed HTTP response
                            
                                Using XPath Selector 'following-sibling::text()' in Selenium (Python)
                            
                                Python Config parser read comment along with value
                            
                                AttributeError: 'module' object has no attribute 'strptime' -- Possible Bug?
                            
                                Graphviz - Drawing maximal cliques
                            
                                XMPP server for Python [closed]
                            
                                I found myself swinging the list comprehension hammer
                            
                                Create a dynamic 2D numpy array on the fly
                            
                                How to safely run unreliable piece of code?
                            
                                How I do to update data on many-to-many with WTForms and SQLAlchemy?
                            
                                Python: How to create simple web pages without a huge framework? [closed]
                            
                                Positioning of classes in UML diagram

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With