Removing duplicate elements from a Python list containing unhashable elements while preserving order?

Question

I have a datastructure like this:

[
[('A', '1'), ('B', '2')],
[('A', '1'), ('B', '2')],
[('A', '4'), ('C', '5')]
]

And I want to obtain this:

[
[('A', '1'), ('B', '2')],
[('A', '4'), ('C', '5')]
]

Is there a good way of doing this while preserving order as shown?

Commands for copy-pasting:

sample = []
sample.append([('A', '1'), ('B', '2')])
sample.append([('A', '1'), ('B', '2')])
sample.append([('A', '4'), ('C', '5')])

Raymond Hettinger · Accepted Answer

This is a somewhat famous question which was well answered by a famous Pythonista long ago: http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/

If you can assume equal records are adjacent, there is a recipe in the itertools docs:

from operator import itemgetter
from itertools import groupby, imap

def unique_justseen(iterable, key=None):
    "List unique elements, preserving order. Remember only the element just seen."
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D
    return imap(next, imap(itemgetter(1), groupby(iterable, key)))

If you can only assume orderable elements, here a variant using the bisect module. Given n inputs with r unique values, its search step costs O(n log r). If a new unique value is found, it is inserted in the seen list for a cost of O(r * r).

from bisect import bisect_left, insort

def dedup(seq):
    'Remove duplicates. Preserve order first seen.  Assume orderable, but not hashable elements'
    result = []
    seen = []
    for x in seq:
        i = bisect_left(seen, x)
        if i == len(seen) or seen[i] != x:
            seen.insert(i, x)
            result.append(x)
    return result

Sven Marnach · Answer

Here is an order-preserving variant of the sort/unique idiom. This will give you O(n log n) performance, provided your items are at least sortable.

def unique(a):
    indices = sorted(range(len(a)), key=a.__getitem__)
    indices = set(next(it) for k, it in 
                  itertools.groupby(indices, key=a.__getitem__))
    return [x for i, x in enumerate(a) if i in indices]

Example (with hashable items for simplicity):

>>> a = ['F', 'J', 'B', 'F', 'V', 'A', 'E', 'U', 'B', 'U', 'Z', 'K']
>>> unique(a)
['F', 'J', 'B', 'V', 'A', 'E', 'U', 'Z', 'K']

Removing duplicate elements from a Python list containing unhashable elements while preserving order?

Tags:

python

Legend

2 Answers

Raymond Hettinger

Sven Marnach

Recent Activity

Donate For Us

Removing duplicate elements from a Python list containing unhashable elements while preserving order?

Tags:

python

Legend

2 Answers

Raymond Hettinger

Sven Marnach

Related questions

Recent Activity

Donate For Us