compare two lists to get non matching elements

Question

I checked this comparing lists, Only one answer is relative to what I am trying to do. I have to lists with some similiar elements, I want to get the non -matching elements.

len(h) = 1973182  #h[0] = 'B00006J8F4F2', y[0] = 'B0075Y2X2GO6'
len(y) = 656890

I am doing

new_list = [i for i in h if i not in y],however this takes about 13 minutes to do, Is there a faster way of doing this?

In refer to "duplicate" question, Finding elements not in a list, I use the same code, What I am looking for is a faster way of doing it.

yatu · Accepted Answer

You can use sets to more efficiently find the difference between both lists. If you need to keep the order in the original list you can use sorted with a key.

We want to sort the elements in the set according to their appearance in the original list, so one way is to build a lookup dictionary. We can use enumerate for that. Then we only need to lookup on the dictionary as a key function:

d = {j:i for i,j in enumerate(h)}
new_list  = sorted(list((set(h) - set(y))), key = lambda x: d[x])

Let's try with a simple example:

y = range(5)
h = range(7)
d = {j:i for i,j in enumerate(h)}
sorted(list((set(h) - set(y))), key = lambda x: d[x])
# [5, 6]

Timings -

import random
y = random.sample(range(1, 10001), 10000)
h = random.sample(range(1, 20001), 10000)

%timeit [i for i in h if i not in y]
# 1.28 s ± 37.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

def using_sets(a,b):
    d = {j:i for i,j in enumerate(a)}
    sorted(list((set(a) - set(b))), key = lambda x: d[x])

%timeit using_sets(h,y)
# 6.16 ms ± 373 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So there's a clear improvement, with the proposed approach performing up to 200 times faster.

doctorlove · Answer

The answer you linked to suggests using sets, because they use hashes to look thing up quickly. With lists, and in, like

new_list = [i for i in h if i not in y]

the whole of list y needs checking each time for each i in h.

You could use sets, but as has been pointed out need to be careful with duplicates getting lost.

You could use a Counter:

from collections import Counter

the with two lists, say

l1 = [1,1,2,3,4]
l2 = [3,3,4,5,6]

for examples' sake, can use fed into a Counter each

>>> Counter(l1)
Counter({1: 2, 2: 1, 3: 1, 4: 1})
>>> Counter(l2)
Counter({3: 2, 4: 1, 5: 1, 6: 1})

This just walks each list once. Subtracting them gives what's in the first but not the second:

>>> Counter(l1)-Counter(l2)
Counter({1: 2, 2: 1})

The elements tell you what you want

>>> diff = Counter(l1)-Counter(l2)
>>> list(diff.elements())
[1, 1, 2]

compare two lists to get non matching elements

Tags:

python

list

programmerwiz32

2 Answers

yatu

doctorlove

Recent Activity

Donate For Us

compare two lists to get non matching elements

Tags:

python

list

programmerwiz32

2 Answers

yatu

doctorlove

Related questions

Recent Activity

Donate For Us