Ok so I have two lists: <pre class="prettyprint"><code>x = [1, 2, 3, 4] y = [1, 1, 2, 5, 6] </code></pre> I compare them in such a way so I get the following output: <pre class="prettyprint"><code>x = [3, 4] y = [1, 5, 6] </code></pre> The basic is idea to go through each list and compare them. If they have an element in common remove that element. But only one of that element not all of them. If they don't have an element in common leave it. Two identical lists would become x = [], y = [] Here is my rather hacked up and pretty lame solution. I hoping other can recommend a better and / or more pythonic way of doing this. 3 loops seems excessive... <pre class="prettyprint"><code> done = True while not done: done = False for x in xlist: for y in ylist: if x == y: xlist.remove(x) ylist.remove(y) done = False print xlist, ylist </code></pre> Thanks as always for taking the time to read this question. XOXO

It's possible that the data structure you are looking for is the multiset (or "bag"), and if so, a good way to implement it in Python is to use <code>collections.Counter</code>: <pre class="prettyprint"><code>>>> from collections import Counter >>> x = Counter([1, 2, 3, 4]) >>> y = Counter([1, 1, 2, 5, 6]) >>> x - y Counter({3: 1, 4: 1}) >>> y - x Counter({1: 1, 5: 1, 6: 1}) </code></pre> If you want to convert the <code>Counter</code> objects back to lists with multiplicity, you can use the <code>elements</code> method: <pre class="prettyprint"><code>>>> list((x - y).elements()) [3, 4] >>> list((y - x).elements()) [1, 5, 6] </code></pre>

If you don't care about order, use <code>collections.Counter</code> to do it in one line: <pre class="prettyprint"><code>>>> Counter(x)-Counter(y) Counter({3: 1, 4: 1}) >>> Counter(y)-Counter(x) Counter({1: 1, 5: 1, 6: 1}) </code></pre> If you care about order, you can probably iterate through your lists grabbing elements from the above dictionaries: <pre class="prettyprint"><code>def prune(seq, toPrune): """Prunes elements from front of seq in O(N) time""" remainder = Counter(seq)-Counter(toPrune) R = [] for x in reversed(seq): if remainder.get(x): remainder[x] -= 1 R.insert(0,x) return R </code></pre> Demo: <pre class="prettyprint"><code>>>> prune(x,y) [3, 4] >>> prune(y,x) [1, 1, 5, 6] </code></pre>

Looking for more pythonic list comparison solution

Tags:

python

list

Ok so I have two lists:

x = [1, 2, 3, 4]
y = [1, 1, 2, 5, 6]

I compare them in such a way so I get the following output:

x = [3, 4]
y = [1, 5, 6]

The basic is idea to go through each list and compare them. If they have an element in common remove that element. But only one of that element not all of them. If they don't have an element in common leave it. Two identical lists would become x = [], y = []

Here is my rather hacked up and pretty lame solution. I hoping other can recommend a better and / or more pythonic way of doing this. 3 loops seems excessive...

    done = True

    while not done:
        done = False
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
        print xlist, ylist

Thanks as always for taking the time to read this question. XOXO

958

asked Jul 21 '11 00:07

Peach Passion

3 Answers

It's possible that the data structure you are looking for is the multiset (or "bag"), and if so, a good way to implement it in Python is to use collections.Counter:

>>> from collections import Counter
>>> x = Counter([1, 2, 3, 4])
>>> y = Counter([1, 1, 2, 5, 6])
>>> x - y
Counter({3: 1, 4: 1})
>>> y - x
Counter({1: 1, 5: 1, 6: 1})

If you want to convert the Counter objects back to lists with multiplicity, you can use the elements method:

>>> list((x - y).elements())
[3, 4]
>>> list((y - x).elements())
[1, 5, 6]

152

answered Sep 27 '22 03:09

Gareth Rees

If you don't care about order, use collections.Counter to do it in one line:

>>> Counter(x)-Counter(y)
Counter({3: 1, 4: 1})

>>> Counter(y)-Counter(x)
Counter({1: 1, 5: 1, 6: 1})

If you care about order, you can probably iterate through your lists grabbing elements from the above dictionaries:

def prune(seq, toPrune):
    """Prunes elements from front of seq in O(N) time"""
    remainder = Counter(seq)-Counter(toPrune)
    R = []
    for x in reversed(seq):
        if remainder.get(x):
            remainder[x] -= 1
            R.insert(0,x)
    return R

Demo:

>>> prune(x,y)
[3, 4]
>>> prune(y,x)
[1, 1, 5, 6]

answered Sep 27 '22 03:09

ninjagecko

To build on Gareth's answer:

>>> a = Counter([1, 2, 3, 4])
>>> b = Counter([1, 1, 2, 5, 6])
>>> (a - b).elements()
[3, 4]
>>> (b - a).elements()
[1, 5, 6]

Benchmark code:

from collections import Counter
from collections import defaultdict
import random

# short lists
#a = [1, 2, 3, 4, 7, 8, 9]
#b = [1, 1, 2, 5, 6, 8, 8, 10]

# long lists
a = []
b = []

for i in range(0, 1000):
    q = random.choice((1, 2, 3, 4))
    if q == 1:
        a.append(i)
    elif q == 2:
        b.append(i)
    elif q == 3:
        a.append(i)
        b.append(i)
    else:
        a.append(i)
        b.append(i)
        b.append(i)

# Modifies the lists in-place! Naughty! And it doesn't actually work, to boot.
def original(xlist, ylist):
    done = False

    while not done:
        done = True
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
    return xlist, ylist # not strictly necessary, see above


def counter(xlist, ylist):
    x = Counter(xlist)
    y = Counter(ylist)
    return ((x-y).elements(), (y-x).elements())


def nasty(xlist, ylist):
    x = sum(([i]*(xlist.count(i)-ylist.count(i)) for i in set(xlist)),[])
    y = sum(([i]*(ylist.count(i)-xlist.count(i)) for i in set(ylist)),[])

    return x, y


def gnibbler(xlist, ylist):
    d = defaultdict(int)
    for i in xlist: d[i] += 1
    for i in ylist: d[i] -= 1
    return [k for k,v in d.items() for i in range(v)], [k for k,v in d.items() for i in range(-v)]

# substitute algorithm to test in the call
for x in range(0, 100000):
    original(list(a), list(b))

Running the Insufficiently Rigorous Benchmarks[tm] (short lists are the original ones, long lists are randomly generated lists approximately 1000 entries long with a mix of matches and repeats, times given in multipliers of the Original algorithm):

    100K iterations, short lists    1K iterations, long lists
Original     1.0                           1.0
Counter      9.3                           0.06
Nasty        2.9                           1.4
Gnibbler     2.4                           0.02

Note 1: The creation of the Counter object seems to overshadow the actual algorithm at small list sizes.

Note 2: Original and gnibbler are the same at list lengths of approximately 35, above which gnibbler (and Counter) are faster.

answered Sep 26 '22 03:09

Scott A

Related questions
                            
                                Functional programming in Python and C++ [closed]
                            
                                Django: is_authenticated and is_anonymous both return true after logout
                            
                                In Django, how can I automatically set "cache-control" for every template render?
                            
                                django.db.utils.DatabaseError
                            
                                Using an iterator to print integers
                            
                                What command to use to introspect instances in scala REPL?
                            
                                is there a way to script in Python to change user passwords in Linux? if so, how?
                            
                                Fetch only the last 128 bytes of an mp3 file over a http connection
                            
                                Django: how to use settings in templates? [duplicate]
                            
                                Getting specific line and value with Python DictReader
                            
                                What does underscoring methods connote?
                            
                                replace empty string(s) in tuple
                            
                                Modifiying CSV export in scrapy
                            
                                Regular Expression GUI?
                            
                                Converting a nested dictionary to a list
                            
                                Storing an array of integers with Django
                            
                                lxml memory usage when parsing huge xml in python
                            
                                Django Error Reporting Email when Debug = True
                            
                                python: force two zeroes after dot when converting float to string
                            
                                Django and wrap lines problem

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With