Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get difference between two lists

I have two lists in Python, like these:

temp1 = ['One', 'Two', 'Three', 'Four'] temp2 = ['One', 'Two'] 

I need to create a third list with items from the first list which aren't present in the second one. From the example I have to get

temp3 = ['Three', 'Four'] 

Are there any fast ways without cycles and checking?

like image 654
Max Frai Avatar asked Aug 11 '10 19:08

Max Frai


People also ask

How do I compare two lists in Java and get differences?

Using the Java List API. We can create a copy of one list and then remove all the elements common with the other using the List method removeAll(): List<String> differences = new ArrayList<>(listOne); differences. removeAll(listTwo); assertEquals(2, differences.

Can we subtract 2 lists in Python?

Use Numpy to Subtract Two Python Lists One of the methods that numpy provides is the subtract() method. The method takes two numpy array s as input and provides element-wise subtractions between the two lists.


2 Answers

The existing solutions all offer either one or the other of:

  • Faster than O(n*m) performance.
  • Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2) temp3 = [x for x in temp1 if x not in s] 

Performance test

import timeit init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]' print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000) print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000) print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000) 

Results:

4.34620224079 # ars' answer 4.2770634955  # This answer 30.7715615392 # matt b's answer 

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn't require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here's a second test demonstrating this:

init = ''' temp1 = [str(i) for i in range(100000)] temp2 = [str(i * 2) for i in range(50)] ''' 

Results:

11.3836875916 # ars' answer 3.63890368748 # this answer (3 times faster!) 37.7445402279 # matt b's answer 
like image 37
Mark Byers Avatar answered Sep 24 '22 02:09

Mark Byers


To get elements which are in temp1 but not in temp2 :

In [5]: list(set(temp1) - set(temp2)) Out[5]: ['Four', 'Three'] 

Beware that it is asymmetric :

In [5]: set([1, 2]) - set([2, 3]) Out[5]: set([1])  

where you might expect/want it to equal set([1, 3]). If you do want set([1, 3]) as your answer, you can use set([1, 2]).symmetric_difference(set([2, 3])).

like image 157
ars Avatar answered Sep 23 '22 02:09

ars