You can use sets:
main_list = list(set(list_2) - set(list_1))
Output:
>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']
Per @JonClements' comment, here is a tidier version:
>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']
TL;DR:
SOLUTION (1)
import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
SOLUTION (2) You want a sorted list
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
main_list = setdiff_sorted(list_2,list_1)
EXPLANATIONS:
(1) You can use NumPy's setdiff1d
(array1
,array2
,assume_unique
=False
).
assume_unique
asks the user IF the arrays ARE ALREADY UNIQUE.
If False
, then the unique elements are determined first.
If True
, the function will assume that the elements are already unique AND function will skip determining the unique elements.
This yields the unique values in array1
that are not in array2
. assume_unique
is False
by default.
If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where assume_unique=False
=> the default value):
import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"]
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
(2)
For those who want answers to be sorted, I've made a custom function:
import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
To get the answer, run:
main_list = setdiff_sorted(list_2,list_1)
SIDE NOTES:
(a) Solution 2 (custom function setdiff_sorted
) returns a list (compared to an array in solution 1).
(b) If you aren't sure if the elements are unique, just use the default setting of NumPy's setdiff1d
in both solutions A and B. What can be an example of a complication? See note (c).
(c) Things will be different if either of the two lists is not unique.
Say list_2
is not unique: list2 = ["a", "f", "c", "m", "m"]
. Keep list1
as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique
yields ["f", "m"]
(in both solutions). HOWEVER, if you set assume_unique=True
, both solutions give ["f", "m", "m"]
. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique
to its default value. Note that both answers are sorted.
pythonnumpy
Not sure why the above explanations are so complicated when you have native methods available:
main_list = list(set(list_2)-set(list_1))
Use a list comprehension like this:
main_list = [item for item in list_2 if item not in list_1]
Output:
>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"]
>>>
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']
Edit:
Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that's the case, a better option would be converting list_1
to a set
first:
set_1 = set(list_1) # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]
If you want a one-liner solution (ignoring imports) that only requires O(max(n, m))
work for inputs of length n
and m
, not O(n * m)
work, you can do so with the itertools
module:
from itertools import filterfalse
main_list = list(filterfalse(set(list_1).__contains__, list_2))
This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because filterfalse
stores it internally); list comprehensions and generator expressions can do this, but it's ugly.†
That gets the same results in a single line as:
main_list = [x for x in list_2 if x not in list_1]
with the speed of:
set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]
Of course, if the comparisons are intended to be positional, so:
list_1 = [1, 2, 3]
list_2 = [2, 3, 4]
should produce:
main_list = [2, 3, 4]
(because no value in list_2
has a match at the same index in list_1
), you should definitely go with Patrick's answer, which involves no temporary list
s or set
s (even with set
s being roughly O(1)
, they have a higher "constant" factor per check than simple equality checks) and involves O(min(n, m))
work, less than any other answer, and if your problem is position sensitive, is the only correct solution when matching elements appear at mismatched offsets.
†: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the "outermost" loop, e.g.:
main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]
which also gives a minor performance benefit on Python 3 (because now set_1
is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn't matter, because Python 2 doesn't use closures for list comprehensions; they operate in the same scope they're used in).
main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]
for i in list_2:
if i not in list_1:
main_list.append(i)
print(main_list)
output:
['f', 'm']
I used two methods and I found one method useful over other. Here is my answer:
My input data:
crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']
Method1: np.setdiff1d
I like this approach over other because it preserves the position
test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']
Method2: Though it gives same answer as in Method1 but disturbs the order
test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']
Method1 np.setdiff1d
meets my requirements perfectly.
This answer for information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With