I have a list of lists sorted in an ascending order, similar to this one:
input = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]
I want to filter this list so that the new list would only contain the first two (or the only) element with matching integers in position 0, like so:
output = [[1,1],[1,2],[2,1],[2,2],[3,1],[6,1],[6,2]]
It would be ideal if the remaining elements (the ones which did not meet the criteria) would remain on the input list, while the matching elements would be stored separately.
How do I go about doing this?
Thank you in advance!
Edit: The elements on the index 1 could be virtually any integers, e.g. [[1,6],[1,7],[1,8],[2,1],[2,2]]
Although this is a bit overkill, we can use pandas for this:
import pandas as pd
pd.DataFrame(d).groupby(0).head(2).values.tolist()
With d
the original list. This then yields:
>>> pd.DataFrame(d).groupby(0).head(2).values.tolist()
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]
Note that this will return copies of the lists, not the original lists. Furthermore all the rows should have the same number of items.
groupby
and islice
If the list is ordered lexicographically, then we can use itertools.groupby
:
from operator import itemgetter
from itertools import groupby, islice
[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]
this again yields:
>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2)]
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]
It is also more flexible since we copy the reference to the list, and all lists can have a different number of elements (at least one here).
EDIT
The rest of the values can be obtained, by letting islice
work the opposite way: retain everything but the firs two:
[e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]
we then obtain:
>>> [e for _, g in groupby(d, itemgetter(0)) for e in islice(g, 2, None)]
[[1, 3], [1, 4], [2, 3]]
You could also use a collections.defaultdict
to group the sublists by the first index:
from collections import defaultdict
from pprint import pprint
input_lst = [[1,1],[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[3,1],[6,1],[6,2]]
groups = defaultdict(list)
for lst in input_lst:
key = lst[0]
groups[key].append(lst)
pprint(groups)
Which gives this grouped dictionary:
defaultdict(<class 'list'>,
{1: [[1, 1], [1, 2], [1, 3], [1, 4]],
2: [[2, 1], [2, 2], [2, 3]],
3: [[3, 1]],
6: [[6, 1], [6, 2]]})
Then you could just take the first two [:2]
values from each key, and make sure the result is flattened and sorted in the end:
from itertools import chain
result = sorted(chain.from_iterable(x[:2] for x in groups.values()))
print(result)
Which outputs:
[[1, 1], [1, 2], [2, 1], [2, 2], [3, 1], [6, 1], [6, 2]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With