Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove first encountered elements from a list

Tags:

python

list

I have two Python lists with the same number of elements. The elements of the first list are unique, the ones in the second list - not necessarily so. For instance

list1 = ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7']
list2 = ['h1', 'h2', 'h1', 'h3', 'h1', 'h2', 'h4']

I want to remove all the "first encountered" elements from the second list and their corresponding elements from the first list. Basically, this means removing all unique elements and the first element of the duplicates. With the above example, the correct result should be

>>>list1
['e3', 'e5', 'e6']
>>>list2
['h1', 'h1', 'h2']

That is, the element 'e1' was removed because its corresponding 'h1' was encountered for the first time, 'e2' was removed because 'h2' was seen for the first time, 'e3' was left because 'h1' was already seen, 'e4' was removed because 'h3' was seen for the first time, 'e5' was left because 'h1' was already seen, 'e6' was left because 'h2' was already seen, and 'e7' was removed because 'h4' was seen for the first time.

What would be an efficient way to solve this problem? The lists could contain thousands of elements, so I'd rather not make duplicates of them or run multiple loops, if possible.

like image 436
bontchev Avatar asked Aug 17 '16 08:08

bontchev


4 Answers

Just use a set object to lookup if the current value is already seen, like this

>>> list1 = ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7']
>>> list2 = ['h1', 'h2', 'h1', 'h3', 'h1', 'h2', 'h4']
>>>
>>> def filterer(l1, l2):
...     r1 = []
...     r2 = []
...     seen = set()
...     for e1, e2 in zip(l1, l2):
...         if e2 not in seen:
...             seen.add(e2)
...         else:
...             r1.append(e1)
...             r2.append(e2)
...     return r1, r2
...
>>> list1, list2 = filterer(list1, list2)
>>> list1
['e3', 'e5', 'e6']
>>> list2
['h1', 'h1', 'h2']

If you are going to consume the elements one-by-one and if the input lists are pretty big, then I would recommend making a generator, like this

>>> def filterer(l1, l2):
...     seen = set()
...     for e1, e2 in zip(l1, l2):
...         if e2 not in seen:
...             seen.add(e2)
...         else:
...             yield e1, e2
...
>>> list(filterer(list1, list2))
[('e3', 'h1'), ('e5', 'h1'), ('e6', 'h2')]
>>>
>>> zip(*filterer(list1, list2))
[('e3', 'e5', 'e6'), ('h1', 'h1', 'h2')]
like image 60
thefourtheye Avatar answered Oct 13 '22 17:10

thefourtheye


I might be code golfing here but I find this interesting:

list1_new = [x for i, x in enumerate(list1) if list2[i] in list2[:i]]
print(list1_new)
# prints ['e3', 'e5', 'e6']

What happens here in case you are not familiar with list comprehensions is the following (reading it from the end):

  • I am checking whether element i of list2 exists in a slicing of list2 that includes all previous elements list2[:i].
  • if it does then I capture the corresponding element from list1 (x) and I store it in the new list I am creating list1_new
like image 34
Ma0 Avatar answered Oct 13 '22 16:10

Ma0


An efficient way would be to use a set, which contains all already seen keys. A set will guarantee you an average lookup of O(1).

So something like this should work:

s = set()
result1 = []
result2 = []
for x, y in zip(list1, list2):
    if y in s:
        result1.append(x)
        result2.append(y)
    else:
        s.add(y)

Notice, this will create a new list. Shouldn't be a big problem though, since Python doesn't actually copy the strings, but only creates a pointer to the original string.

like image 23
Jakube Avatar answered Oct 13 '22 17:10

Jakube


Use a set to keep track of values you've already encountered:

seen= set()
index= 0
while index < len(list1):
    i1, i2= list1[index], list2[index]
    if i2 in seen:
        index+= 1
    else:
        seen.add(i2)
        del list1[index]
        del list2[index]
like image 39
Aran-Fey Avatar answered Oct 13 '22 16:10

Aran-Fey