I have two Python lists with the same number of elements. The elements of the first list are unique, the ones in the second list - not necessarily so. For instance
list1 = ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7']
list2 = ['h1', 'h2', 'h1', 'h3', 'h1', 'h2', 'h4']
I want to remove all the "first encountered" elements from the second list and their corresponding elements from the first list. Basically, this means removing all unique elements and the first element of the duplicates. With the above example, the correct result should be
>>>list1
['e3', 'e5', 'e6']
>>>list2
['h1', 'h1', 'h2']
That is, the element 'e1' was removed because its corresponding 'h1' was encountered for the first time, 'e2' was removed because 'h2' was seen for the first time, 'e3' was left because 'h1' was already seen, 'e4' was removed because 'h3' was seen for the first time, 'e5' was left because 'h1' was already seen, 'e6' was left because 'h2' was already seen, and 'e7' was removed because 'h4' was seen for the first time.
What would be an efficient way to solve this problem? The lists could contain thousands of elements, so I'd rather not make duplicates of them or run multiple loops, if possible.
Just use a set
object to lookup if the current value is already seen, like this
>>> list1 = ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7']
>>> list2 = ['h1', 'h2', 'h1', 'h3', 'h1', 'h2', 'h4']
>>>
>>> def filterer(l1, l2):
... r1 = []
... r2 = []
... seen = set()
... for e1, e2 in zip(l1, l2):
... if e2 not in seen:
... seen.add(e2)
... else:
... r1.append(e1)
... r2.append(e2)
... return r1, r2
...
>>> list1, list2 = filterer(list1, list2)
>>> list1
['e3', 'e5', 'e6']
>>> list2
['h1', 'h1', 'h2']
If you are going to consume the elements one-by-one and if the input lists are pretty big, then I would recommend making a generator, like this
>>> def filterer(l1, l2):
... seen = set()
... for e1, e2 in zip(l1, l2):
... if e2 not in seen:
... seen.add(e2)
... else:
... yield e1, e2
...
>>> list(filterer(list1, list2))
[('e3', 'h1'), ('e5', 'h1'), ('e6', 'h2')]
>>>
>>> zip(*filterer(list1, list2))
[('e3', 'e5', 'e6'), ('h1', 'h1', 'h2')]
I might be code golfing here but I find this interesting:
list1_new = [x for i, x in enumerate(list1) if list2[i] in list2[:i]]
print(list1_new)
# prints ['e3', 'e5', 'e6']
What happens here in case you are not familiar with list comprehensions is the following (reading it from the end):
i
of list2
exists in a slicing of list2
that includes all previous elements list2[:i]
.list1
(x
) and I store it in the new list I am creating list1_new
An efficient way would be to use a set
, which contains all already seen keys. A set
will guarantee you an average lookup of O(1)
.
So something like this should work:
s = set()
result1 = []
result2 = []
for x, y in zip(list1, list2):
if y in s:
result1.append(x)
result2.append(y)
else:
s.add(y)
Notice, this will create a new list. Shouldn't be a big problem though, since Python doesn't actually copy the strings, but only creates a pointer to the original string.
Use a set to keep track of values you've already encountered:
seen= set()
index= 0
while index < len(list1):
i1, i2= list1[index], list2[index]
if i2 in seen:
index+= 1
else:
seen.add(i2)
del list1[index]
del list2[index]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With