Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove list items depending on predecessor in python

Tags:

python

list

Given a Python list, I want to remove consecutive 'duplicates'. The duplicate value however is a attribute of the list item (In this example, the tuple's first element).

Input:

[(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]

Desired Output:

[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]

Cannot use set or dict, because order is important.

Cannot use list comprehension [x for x in somelist if not determine(x)], because the check depends on predecessor.

What I want is something like:

mylist = [...]

for i in range(len(mylist)):
    if mylist[i-1].attr == mylist[i].attr:
        mylist.remove(i)

What is the preferred way to solve this in Python?

like image 276
Sparkofska Avatar asked Apr 17 '19 08:04

Sparkofska


People also ask

How do you remove part of a list in Python?

The pop() method removes an element at a given index, and will also return the removed item. You can also use the del keyword in Python to remove an element or slice from a list.

What are the two methods for removing items from a list in Python?

In Python, use list methods clear() , pop() , and remove() to remove items (elements) from a list. It is also possible to delete items using del statement by specifying a position or range with an index or slice.


3 Answers

You can use itertools.groupby (demonstration with more data):

from itertools import groupby
from operator import itemgetter

data = [(1, 'a'), (2, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (3, 'a')]

[next(group) for key, group in groupby(data, key=itemgetter(0))]

Output:

[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (2, 'a'), (3, 'a')]

For completeness, an iterative approach based on other answers:

result = []

for first, second in zip(data, data[1:]):
    if first[0] != second[0]:
        result.append(first)

result

Output:

[(1, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a')]

Note that this keeps the last duplicate, instead of the first.

like image 77
gmds Avatar answered Oct 16 '22 04:10

gmds


In order to remove consecutive duplicates, you could use itertools.groupby:

l = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
from itertools import groupby
[tuple(k) for k, _ in groupby(l)]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
like image 41
yatu Avatar answered Oct 16 '22 02:10

yatu


If I am not mistaken, you only need to lookup the last value.

test = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a'),(3, 'a'),(4,"a"),(4,"a")]

result = []

for i in test:
    if result and i[0] == result[-1][0]: #edited since OP considers (1,"a") and (1,"b") as duplicate
    #if result and i == result[-1]:
        continue
    else:
        result.append(i)

print (result)

Output:

[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (3, 'a'), (4, 'a')]
like image 26
Henry Yik Avatar answered Oct 16 '22 02:10

Henry Yik