I have a list of pairs: <pre class="prettyprint"><code>[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1] </code></pre> and I want to remove any duplicates where <pre class="prettyprint"><code>[a,b] == [b,a] </code></pre> So we end up with just <pre class="prettyprint"><code>[0, 1], [0, 4], [1, 4] </code></pre> I can do an inner & outer loop checking for the reverse pair and append to a list if that's not the case, but I'm sure there's a more Pythonic way of achieving the same results.

If you need to preserve the order of the elements in the list then, you can use a the <code>sorted</code> function and set comprehension with <code>map</code> like this: <pre class="prettyprint"><code>lst = [0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1] data = {tuple(item) for item in map(sorted, lst)} # {(0, 1), (0, 4), (1, 4)} </code></pre> or simply without <code>map</code> like this: <pre class="prettyprint"><code>data = {tuple(sorted(item)) for item in lst} </code></pre> Another way is to use a <code>frozenset</code> as shown here however note that this only work if you have distinct elements in your list. Because like <code>set</code>, <code>frozenset</code> always contains unique values. So you will end up with unique value in your sublist(lose data) which may not be what you want. To output a list, you can always use <code>list(map(list, result))</code> where result is a set of tuple only in Python-3.0 or newer.

If you only want to remove reversed pairs and don't want external libraries you could use a simple generator function (loosly based on the <code>itertools</code> "unique_everseen" recipe): <pre class="prettyprint"><code>def remove_reversed_duplicates(iterable): # Create a set for already seen elements seen = set() for item in iterable: # Lists are mutable so we need tuples for the set-operations. tup = tuple(item) if tup not in seen: # If the tuple is not in the set append it in REVERSED order. seen.add(tup[::-1]) # If you also want to remove normal duplicates uncomment the next line # seen.add(tup) yield item >>> list(remove_reversed_duplicates(a)) [[0, 1], [0, 4], [1, 4]] </code></pre> The generator function might be a pretty fast way to solve this problem because set-lookups are really cheap. This approach also keeps the order of your initial list and only removes reverse duplicates while being faster than most of the alternatives! <hr> If you don't mind using an external library and you want to remove all duplicates (reversed and identical) an alternative is: <code>iteration_utilities.unique_everseen</code> <pre class="prettyprint"><code>>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]] >>> from iteration_utilities import unique_everseen >>> list(unique_everseen(a, key=set)) [[0, 1], [0, 4], [1, 4]] </code></pre> This checks if any item has the same contents in arbitary order (thus the <code>key=set</code>) as another. In this case this works as expected but it also removes duplicate <code>[a, b]</code> instead of only <code>[b, a]</code> occurences. You could also use <code>key=sorted</code> (like the other answers suggest). The <code>unique_everseen</code> like this has a bad algorithmic complexity because the result of the <code>key</code> function is not hashable and thus the fast lookup is replaced by a slow lookup. To speed this up you need to make the keys hashable, for example by converting them to sorted tuples (like some other answers suggest): <pre class="prettyprint"><code>>>> from iteration_utilities import chained >>> list(unique_everseen(a, key=chained(sorted, tuple))) [[0, 1], [0, 4], [1, 4]] </code></pre> The <code>chained</code> is nothing else than a faster alternative to <code>lambda x: tuple(sorted(x))</code>. EDIT: As mentioned by @jpmc26 one could use <code>frozenset</code> instead of normal sets: <pre class="prettyprint"><code>>>> list(unique_everseen(a, key=frozenset)) [[0, 1], [0, 4], [1, 4]] </code></pre> <hr> To get an idea about the performance I did some <code>timeit</code> comparisons for the different suggestions: <pre class="prettyprint"><code>>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]] >>> %timeit list(remove_reversed_duplicates(a)) 100000 loops, best of 3: 16.1 µs per loop >>> %timeit list(unique_everseen(a, key=frozenset)) 100000 loops, best of 3: 13.6 µs per loop >>> %timeit list(set(map(frozenset, a))) 100000 loops, best of 3: 7.23 µs per loop >>> %timeit list(unique_everseen(a, key=set)) 10000 loops, best of 3: 26.4 µs per loop >>> %timeit list(unique_everseen(a, key=chained(sorted, tuple))) 10000 loops, best of 3: 25.8 µs per loop >>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))] 10000 loops, best of 3: 29.8 µs per loop >>> %timeit set(tuple(item) for item in map(sorted, a)) 10000 loops, best of 3: 28.5 µs per loop </code></pre> Long list with many duplicates: <pre class="prettyprint"><code>>>> import random >>> a = [[random.randint(0, 10), random.randint(0,10)] for _ in range(10000)] >>> %timeit list(remove_reversed_duplicates(a)) 100 loops, best of 3: 12.5 ms per loop >>> %timeit list(unique_everseen(a, key=frozenset)) 100 loops, best of 3: 10 ms per loop >>> %timeit set(map(frozenset, a)) 100 loops, best of 3: 10.4 ms per loop >>> %timeit list(unique_everseen(a, key=set)) 10 loops, best of 3: 47.7 ms per loop >>> %timeit list(unique_everseen(a, key=chained(sorted, tuple))) 10 loops, best of 3: 22.4 ms per loop >>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))] 10 loops, best of 3: 24 ms per loop >>> %timeit set(tuple(item) for item in map(sorted, a)) 10 loops, best of 3: 35 ms per loop </code></pre> And with fewer duplicates: <pre class="prettyprint"><code>>>> a = [[random.randint(0, 100), random.randint(0,100)] for _ in range(10000)] >>> %timeit list(remove_reversed_duplicates(a)) 100 loops, best of 3: 15.4 ms per loop >>> %timeit list(unique_everseen(a, key=frozenset)) 100 loops, best of 3: 13.1 ms per loop >>> %timeit set(map(frozenset, a)) 100 loops, best of 3: 11.8 ms per loop >>> %timeit list(unique_everseen(a, key=set)) 1 loop, best of 3: 1.96 s per loop >>> %timeit list(unique_everseen(a, key=chained(sorted, tuple))) 10 loops, best of 3: 24.2 ms per loop >>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))] 10 loops, best of 3: 31.1 ms per loop >>> %timeit set(tuple(item) for item in map(sorted, a)) 10 loops, best of 3: 36.7 ms per loop </code></pre> So the variants with <code>remove_reversed_duplicates</code>, <code>unique_everseen</code>(<code>key=frozenset</code>) and <code>set(map(frozenset, a))</code> seem to be by far the fastest solutions. Which one depends on the length of the input and the number of duplicates.

Pythonic way of removing reversed duplicates in list

Tags:

python

duplicates

python-2.7

I have a list of pairs:

Click to copy

[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]

and I want to remove any duplicates where

Click to copy

[a,b] == [b,a]

So we end up with just

Click to copy

[0, 1], [0, 4], [1, 4]

I can do an inner & outer loop checking for the reverse pair and append to a list if that's not the case, but I'm sure there's a more Pythonic way of achieving the same results.

505

asked Dec 15 '16 12:12

Mr Mystery Guest

2 Answers

If you need to preserve the order of the elements in the list then, you can use a the sorted function and set comprehension with map like this:

Click to copy

lst = [0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1] data = {tuple(item) for item in map(sorted, lst)} # {(0, 1), (0, 4), (1, 4)}

or simply without map like this:

Click to copy

data = {tuple(sorted(item)) for item in lst}

Another way is to use a frozenset as shown here however note that this only work if you have distinct elements in your list. Because like set, frozenset always contains unique values. So you will end up with unique value in your sublist(lose data) which may not be what you want.

To output a list, you can always use list(map(list, result)) where result is a set of tuple only in Python-3.0 or newer.

answered Sep 19 '22 17:09

styvane

If you only want to remove reversed pairs and don't want external libraries you could use a simple generator function (loosly based on the itertools "unique_everseen" recipe):

Click to copy

def remove_reversed_duplicates(iterable):     # Create a set for already seen elements     seen = set()     for item in iterable:         # Lists are mutable so we need tuples for the set-operations.         tup = tuple(item)         if tup not in seen:             # If the tuple is not in the set append it in REVERSED order.             seen.add(tup[::-1])             # If you also want to remove normal duplicates uncomment the next line             # seen.add(tup)             yield item  >>> list(remove_reversed_duplicates(a)) [[0, 1], [0, 4], [1, 4]]

The generator function might be a pretty fast way to solve this problem because set-lookups are really cheap. This approach also keeps the order of your initial list and only removes reverse duplicates while being faster than most of the alternatives!

If you don't mind using an external library and you want to remove all duplicates (reversed and identical) an alternative is: iteration_utilities.unique_everseen

Click to copy

>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]  >>> from iteration_utilities import unique_everseen  >>> list(unique_everseen(a, key=set)) [[0, 1], [0, 4], [1, 4]]

This checks if any item has the same contents in arbitary order (thus the key=set) as another. In this case this works as expected but it also removes duplicate [a, b] instead of only [b, a] occurences. You could also use key=sorted (like the other answers suggest). The unique_everseen like this has a bad algorithmic complexity because the result of the key function is not hashable and thus the fast lookup is replaced by a slow lookup. To speed this up you need to make the keys hashable, for example by converting them to sorted tuples (like some other answers suggest):

Click to copy

>>> from iteration_utilities import chained >>> list(unique_everseen(a, key=chained(sorted, tuple))) [[0, 1], [0, 4], [1, 4]]

The chained is nothing else than a faster alternative to lambda x: tuple(sorted(x)).

EDIT: As mentioned by @jpmc26 one could use frozenset instead of normal sets:

Click to copy

>>> list(unique_everseen(a, key=frozenset)) [[0, 1], [0, 4], [1, 4]]

To get an idea about the performance I did some timeit comparisons for the different suggestions:

Click to copy

>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]  >>> %timeit list(remove_reversed_duplicates(a)) 100000 loops, best of 3: 16.1 µs per loop >>> %timeit list(unique_everseen(a, key=frozenset)) 100000 loops, best of 3: 13.6 µs per loop >>> %timeit list(set(map(frozenset, a))) 100000 loops, best of 3: 7.23 µs per loop  >>> %timeit list(unique_everseen(a, key=set)) 10000 loops, best of 3: 26.4 µs per loop >>> %timeit list(unique_everseen(a, key=chained(sorted, tuple))) 10000 loops, best of 3: 25.8 µs per loop >>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))] 10000 loops, best of 3: 29.8 µs per loop >>> %timeit set(tuple(item) for item in map(sorted, a)) 10000 loops, best of 3: 28.5 µs per loop

Long list with many duplicates:

Click to copy

>>> import random >>> a = [[random.randint(0, 10), random.randint(0,10)] for _ in range(10000)]  >>> %timeit list(remove_reversed_duplicates(a)) 100 loops, best of 3: 12.5 ms per loop >>> %timeit list(unique_everseen(a, key=frozenset)) 100 loops, best of 3: 10 ms per loop >>> %timeit set(map(frozenset, a)) 100 loops, best of 3: 10.4 ms per loop  >>> %timeit list(unique_everseen(a, key=set)) 10 loops, best of 3: 47.7 ms per loop >>> %timeit list(unique_everseen(a, key=chained(sorted, tuple))) 10 loops, best of 3: 22.4 ms per loop >>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))] 10 loops, best of 3: 24 ms per loop >>> %timeit set(tuple(item) for item in map(sorted, a)) 10 loops, best of 3: 35 ms per loop

And with fewer duplicates:

Click to copy

>>> a = [[random.randint(0, 100), random.randint(0,100)] for _ in range(10000)]  >>> %timeit list(remove_reversed_duplicates(a)) 100 loops, best of 3: 15.4 ms per loop >>> %timeit list(unique_everseen(a, key=frozenset)) 100 loops, best of 3: 13.1 ms per loop >>> %timeit set(map(frozenset, a)) 100 loops, best of 3: 11.8 ms per loop   >>> %timeit list(unique_everseen(a, key=set)) 1 loop, best of 3: 1.96 s per loop >>> %timeit list(unique_everseen(a, key=chained(sorted, tuple))) 10 loops, best of 3: 24.2 ms per loop >>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))] 10 loops, best of 3: 31.1 ms per loop >>> %timeit set(tuple(item) for item in map(sorted, a)) 10 loops, best of 3: 36.7 ms per loop

So the variants with remove_reversed_duplicates, unique_everseen(key=frozenset) and set(map(frozenset, a)) seem to be by far the fastest solutions. Which one depends on the length of the input and the number of duplicates.

answered Sep 21 '22 17:09

MSeifert

Related questions
                            
                                Losslessly compressing images on django
                            
                                Strip timezone info in pandas
                            
                                Changing pixel color value in PIL
                            
                                cqlsh connection error: 'ref() does not take keyword arguments'
                            
                                Pandas(Python) : Fill empty cells with with previous row value?
                            
                                select columns based on columns names containing a specific string in pandas
                            
                                Repeat rows in a pandas DataFrame based on column value
                            
                                print() method to print passed expression literally along with computed output for quick debugging
                            
                                Pytorch RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0
                            
                                Django: ModelMultipleChoiceField doesn't select initial choices
                            
                                Spoofing the origination IP address of an HTTP request
                            
                                What's the python __all__ module level variable for? [duplicate]
                            
                                Run Python/Django Management Command from a UnitTest/WebTest
                            
                                Easiest way to combine date and time strings to single datetime object using Python
                            
                                How to set the working directory for a Fabric task?
                            
                                Python - how can I get the class name from within a class method - using @classmethod
                            
                                Convert pandas DataFrame to a nested dict
                            
                                AttributeError: 'module' object has no attribute 'TestCase'
                            
                                Stop Sublime Text from executing infinite loop
                            
                                How do I get around HttpError 403 Insufficient Permission? (gmail api, python)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pythonic way of removing reversed duplicates in list

Tags:

python

duplicates

python-2.7

Mr Mystery Guest

People also ask

2 Answers

styvane

MSeifert

Recent Activity

Donate For Us