Filtering the two first matching elements in a list

Question

I have a list of lists sorted in an ascending order, similar to this one:

input = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]

I want to filter this list so that the new list would only contain the first two (or the only) element with matching integers in position 0, like so:

output = [[1,1],[1,2],[2,1],[2,2],[3,1],[6,1],[6,2]]

It would be ideal if the remaining elements (the ones which did not meet the criteria) would remain on the input list, while the matching elements would be stored separately.

How do I go about doing this?

Thank you in advance!

Edit: The elements on the index 1 could be virtually any integers, e.g. [[1,6],[1,7],[1,8],[2,1],[2,2]]

Willem Van Onsem · Accepted Answer

Pandas

Although this is a bit overkill, we can use pandas for this:

import pandas as pd

pd.DataFrame(d).groupby(0).head(2).values.tolist()

With d the original list. This then yields:

>>> pd.DataFrame(d).groupby(0).head(2).values.tolist()
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]

Note that this will return copies of the lists, not the original lists. Furthermore all the rows should have the same number of items.

Itertools `groupby` and `islice`

If the list is ordered lexicographically, then we can use itertools.groupby:

from operator import itemgetter
from itertools import groupby, islice

[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]

this again yields:

>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]

It is also more flexible since we copy the reference to the list, and all lists can have a different number of elements (at least one here).

EDIT

The rest of the values can be obtained, by letting islice work the opposite way: retain everything but the firs two:

[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]

we then obtain:

>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]
[[1, 3], [1, 4], [2, 3]]

RoadRunner · Answer

You could also use a collections.defaultdict to group the sublists by the first index:

from collections import defaultdict
from pprint import pprint

input_lst = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]

groups = defaultdict(list)
for lst in input_lst:
    key = lst[0]
    groups[key].append(lst)

pprint(groups)

Which gives this grouped dictionary:

defaultdict(<class 'list'>,
        {1: [[1, 1], [1, 2], [1, 3], [1, 4]],
         2: [[2, 1], [2, 2], [2, 3]],
         3: [[3, 1]],
         6: [[6, 1], [6, 2]]})

Then you could just take the first two [:2] values from each key, and make sure the result is flattened and sorted in the end:

from itertools import chain

result = sorted(chain.from_iterable(x[:2] for x in groups.values()))

print(result)

Which outputs:

[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]

Filtering the two first matching elements in a list

Tags:

python

iteration

list

python-3.x

filter

BaconBad

2 Answers

Pandas

Itertools `groupby` and `islice`

Willem Van Onsem

RoadRunner

Recent Activity

Donate For Us

Filtering the two first matching elements in a list

Tags:

python

iteration

list

python-3.x

filter

BaconBad

2 Answers

Pandas

Itertools groupby and islice

Willem Van Onsem

RoadRunner

Related questions

Recent Activity

Donate For Us

Itertools `groupby` and `islice`