Remove first encountered elements from a list

Question

I have two Python lists with the same number of elements. The elements of the first list are unique, the ones in the second list - not necessarily so. For instance

list1 = ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7']
list2 = ['h1', 'h2', 'h1', 'h3', 'h1', 'h2', 'h4']

I want to remove all the "first encountered" elements from the second list and their corresponding elements from the first list. Basically, this means removing all unique elements and the first element of the duplicates. With the above example, the correct result should be

>>>list1
['e3', 'e5', 'e6']
>>>list2
['h1', 'h1', 'h2']

That is, the element 'e1' was removed because its corresponding 'h1' was encountered for the first time, 'e2' was removed because 'h2' was seen for the first time, 'e3' was left because 'h1' was already seen, 'e4' was removed because 'h3' was seen for the first time, 'e5' was left because 'h1' was already seen, 'e6' was left because 'h2' was already seen, and 'e7' was removed because 'h4' was seen for the first time.

What would be an efficient way to solve this problem? The lists could contain thousands of elements, so I'd rather not make duplicates of them or run multiple loops, if possible.

thefourtheye · Accepted Answer

Just use a set object to lookup if the current value is already seen, like this

>>> list1 = ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7']
>>> list2 = ['h1', 'h2', 'h1', 'h3', 'h1', 'h2', 'h4']
>>>
>>> def filterer(l1, l2):
...     r1 = []
...     r2 = []
...     seen = set()
...     for e1, e2 in zip(l1, l2):
...         if e2 not in seen:
...             seen.add(e2)
...         else:
...             r1.append(e1)
...             r2.append(e2)
...     return r1, r2
...
>>> list1, list2 = filterer(list1, list2)
>>> list1
['e3', 'e5', 'e6']
>>> list2
['h1', 'h1', 'h2']

If you are going to consume the elements one-by-one and if the input lists are pretty big, then I would recommend making a generator, like this

>>> def filterer(l1, l2):
...     seen = set()
...     for e1, e2 in zip(l1, l2):
...         if e2 not in seen:
...             seen.add(e2)
...         else:
...             yield e1, e2
...
>>> list(filterer(list1, list2))
[('e3', 'h1'), ('e5', 'h1'), ('e6', 'h2')]
>>>
>>> zip(*filterer(list1, list2))
[('e3', 'e5', 'e6'), ('h1', 'h1', 'h2')]

Ma0 · Answer

I might be code golfing here but I find this interesting:

list1_new = [x for i, x in enumerate(list1) if list2[i] in list2[:i]]
print(list1_new)
# prints ['e3', 'e5', 'e6']

What happens here in case you are not familiar with list comprehensions is the following (reading it from the end):

I am checking whether element i of list2 exists in a slicing of list2 that includes all previous elements list2[:i].
if it does then I capture the corresponding element from list1 (x) and I store it in the new list I am creating list1_new

Jakube · Answer

An efficient way would be to use a set, which contains all already seen keys. A set will guarantee you an average lookup of O(1).

So something like this should work:

s = set()
result1 = []
result2 = []
for x, y in zip(list1, list2):
    if y in s:
        result1.append(x)
        result2.append(y)
    else:
        s.add(y)

Notice, this will create a new list. Shouldn't be a big problem though, since Python doesn't actually copy the strings, but only creates a pointer to the original string.

Aran-Fey · Answer

Use a set to keep track of values you've already encountered:

seen= set()
index= 0
while index < len(list1):
    i1, i2= list1[index], list2[index]
    if i2 in seen:
        index+= 1
    else:
        seen.add(i2)
        del list1[index]
        del list2[index]

Remove first encountered elements from a list

Tags:

python

list

bontchev

4 Answers

thefourtheye

Ma0

Jakube

Aran-Fey

Recent Activity

Donate For Us

Remove first encountered elements from a list

Tags:

python

list

bontchev

4 Answers

thefourtheye

Ma0

Jakube

Aran-Fey

Related questions

Recent Activity

Donate For Us