Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering the two first matching elements in a list

I have a list of lists sorted in an ascending order, similar to this one:

input = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]

I want to filter this list so that the new list would only contain the first two (or the only) element with matching integers in position 0, like so:

output = [[1,1],[1,2],[2,1],[2,2],[3,1],[6,1],[6,2]]

It would be ideal if the remaining elements (the ones which did not meet the criteria) would remain on the input list, while the matching elements would be stored separately.

How do I go about doing this?

Thank you in advance!

Edit: The elements on the index 1 could be virtually any integers, e.g. [[1,6],[1,7],[1,8],[2,1],[2,2]]

like image 864
BaconBad Avatar asked Jan 28 '23 22:01

BaconBad


2 Answers

Pandas

Although this is a bit overkill, we can use pandas for this:

import pandas as pd

pd.DataFrame(d).groupby(0).head(2).values.tolist()

With d the original list. This then yields:

>>> pd.DataFrame(d).groupby(0).head(2).values.tolist()
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]

Note that this will return copies of the lists, not the original lists. Furthermore all the rows should have the same number of items.

Itertools groupby and islice

If the list is ordered lexicographically, then we can use itertools.groupby:

from operator import itemgetter
from itertools import groupby, islice

[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]

this again yields:

>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]

It is also more flexible since we copy the reference to the list, and all lists can have a different number of elements (at least one here).

EDIT

The rest of the values can be obtained, by letting islice work the opposite way: retain everything but the firs two:

[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]

we then obtain:

>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]
[[1, 3], [1, 4], [2, 3]]
like image 153
Willem Van Onsem Avatar answered Feb 13 '23 21:02

Willem Van Onsem


You could also use a collections.defaultdict to group the sublists by the first index:

from collections import defaultdict
from pprint import pprint

input_lst = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]

groups = defaultdict(list)
for lst in input_lst:
    key = lst[0]
    groups[key].append(lst)

pprint(groups)

Which gives this grouped dictionary:

defaultdict(<class 'list'>,
        {1: [[1, 1], [1, 2], [1, 3], [1, 4]],
         2: [[2, 1], [2, 2], [2, 3]],
         3: [[3, 1]],
         6: [[6, 1], [6, 2]]})

Then you could just take the first two [:2] values from each key, and make sure the result is flattened and sorted in the end:

from itertools import chain

result = sorted(chain.from_iterable(x[:2] for x in groups.values()))

print(result)

Which outputs:

[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]
like image 40
RoadRunner Avatar answered Feb 13 '23 22:02

RoadRunner