Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get n elements of a list not contained in another one?

Tags:

python

I have two lists, of different size (either one can be larger than the other one), with some common elements. I would like to get n elements from the first list which are not in the second one.

I see two families of solutions (the example below is for n=3)

a = [i for i in range(2, 10)]
b = [i * 2 for i in range (1, 10)]
# [2, 3, 4, 5, 6, 7, 8, 9] [2, 4, 6, 8, 10, 12, 14, 16, 18]

# solution 1: generate the whole list, then slice
s1 = list(set(a) - set(b))
s2 = [i for i in a if i not in b]

for i in [s1, s2]:
    print (i[:3])

# solution 2: the simple loop solution
c = 0
s3 = []
for i in a:
    if i not in b:
        s3.append(i)
        c += 1
        if c == 3:
            break
print(s3)

All of the them are correct, the output is

[9, 3, 5]
[3, 5, 7]
[3, 5, 7]

(the first solution does not give the first 3 ones because set does not preserve the order - but this is OK in my case as I will have unsorted (even explicitly shuffled) lists anyway)

Are there the most pythonic and reasonably optimal ones?

The solution 1 first computes the difference, then slices - which I find quite inefficient (the sizes of my lists will be ~100k elements, I will be looking for the first 100 ones).

The solution 2 looks more optimal but it is ugly (which is a matter of taste, but I learned that when something looks ugly in Python, it means that there are usually more pythonic solution).

I will settle for solution 2 if there are no better alternatives.

like image 590
WoJ Avatar asked Mar 15 '23 14:03

WoJ


1 Answers

I would use set.difference and slice:

print(list(set(a).difference(b))[:3])
[3, 5, 7]

set.difference already gives you elements in a that are not in b:

set([3, 5, 7, 9])

So you just need a slice of that.

Or without calling list in the set use iter, next and a comprehension:

diff = iter(set(a).difference(b))
n = 3
sli = [next(diff) for _ in range(n)]
print(sli)

.difference does not create a second set so it is a more efficient solution:

In [1]: a = [i for i in range(2, 10000000)]  
In [2]: b = [i * 2 for i in range (1, 10000000)]   
In [3]: timeit set(a).difference(b)
1 loops, best of 3: 848 ms per loop    
In [4]: timeit set(a)- set(b)
1 loops, best of 3: 1.54 s per loop

For the large lists above s2 = [i for i in a if i not in b] would give you enough time to cook a meal before it finished.

Using iter and .difference:

In [11]: %%timeit                                
diff = iter(set(a).difference(b))
n = 3
sli = [next(diff) for _ in range(n)]
   ....: 
1 loops, best of 3: 797 ms per loop
like image 107
Padraic Cunningham Avatar answered Apr 01 '23 14:04

Padraic Cunningham