This code:
from itertools import groupby, count
L = [38, 98, 110, 111, 112, 120, 121, 898]
groups = groupby(L, key=lambda item, c=count():item-next(c))
tmp = [list(g) for k, g in groups]
Takes [38, 98, 110, 111, 112, 120, 121, 898]
, groups it by consecutive numbers and merge them with this final output:
['38', '98', '110,112', '120,121', '898']
How can the same be done with a list of lists with multiple columns, like this list below where you can group them by name and the consecution of its second column value and then merge.
In other words, this data:
L= [
['Italy','1','3']
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2']]
should give the following output:
[['Italy','1-2-3','3-1-10'],
['France','5','3'],
['Spain','4-5-6','2-8-4'],
['Spain','20','2']]
Should more-itertools be more appropriate for this task?
Group and combine items of multiple-column lists with itertools/more-itertools in Python
Python comes built-in with a helpful library called itertools, that provides helpful functions to work with iteratable objects. One of the many functions it comes with it the combinations () function. This, as the name implies, provides ways to generate combinations of lists. Let’s take a look at how the combinations () function works:
In this tutorial, we are going to learn about itertools.groupby () function in Python. To use this function firstly, we need to import the itertools module in our code. As the name says that itertools is a module that provides functions that work on iterators (like lists, dictionaries etc.).
Python comes built-in with a helpful library called itertools, that provides helpful functions to work with iteratable objects. One of the many functions it comes with it the combinations () function.
Itertools is a module in Python that provides various functions that work on iterators. Meanwhile, combinations () is a function in Python. Combinations () in Python This iterator (function) takes two parameters as input simultaneously.
You can build up on the same recipe and modify the lambda function to include the first item(country) from each row as well. Secondly, you need to sort the list first based on the last occurrence of the country in the list.
from itertools import groupby, count
L = [
['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2']]
indices = {row[0]: i for i, row in enumerate(L)}
sorted_l = sorted(L, key=lambda row: indices[row[0]])
groups = groupby(
sorted_l,
lambda item, c=count(): [item[0], int(item[1]) - next(c)]
)
for k, g in groups:
print [k[0]] + ['-'.join(x) for x in zip(*(x[1:] for x in g))]
['Italy', '1-2-3', '3-1-10']
['France', '5', '3']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']
This is essentially the same grouping technique, but rather than using itertools.count
it uses enumerate
to produce the indices.
First, we sort the data so that all items for a given country are grouped together, and the data is sorted. Then we use groupby
to make a group for each country. Then we use groupby
in the inner loop to group together the consecutive data for each country. Finally, we use zip
& .join
to re-arrange the data into the desired output format.
from itertools import groupby
from operator import itemgetter
lst = [
['Italy','1','3'],
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2'],
]
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), lambda t: int(t[1][1]) - t[0])]
for row in newlst:
print(row)
output
['France', '5', '3']
['Italy', '1-2-3', '3-1-10']
['Spain', '20', '2']
['Spain', '4-5-6', '2-8-4']
I admit that lambda
is a bit cryptic; it'd probably better to use a proper def
function instead. I'll add that here in a few minutes.
Here's the same thing using a much more readable key function.
def keyfunc(t):
# Unpack the index and data
i, data = t
# Get the 2nd column from the data, as an integer
val = int(data[1])
# The difference between val & i is constant in a consecutive group
return val - i
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), keyfunc)]
Instead of using itertools.groupby
that requires multiple sorting, checking, etc. Here is an algorithmically optimized approach using dictionaries:
d = {}
flag = False
for country, i, j in L:
temp = 1
try:
item = int(i)
for counter, recs in d[country].items():
temp += 1
last = int(recs[-1][0])
if item in {last - 1, last, last + 1}:
recs.append([i, j])
recs.sort(key=lambda x: int(x[0]))
flag = True
break
if flag:
flag = False
continue
else:
d[country][temp] = [[i, j]]
except KeyError:
d[country] = {}
d[country][1] = [[i, j]]
Demo on a more complex example:
L = [['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2'],
['France', '5', '44'],
['France', '9', '3'],
['Italy', '3', '10'],
['Italy', '5', '17'],
['Italy', '4', '13'],]
{'France': {1: [['5', '3'], ['5', '44']], 2: [['9', '3']]},
'Spain': {1: [['4', '2'], ['5', '8'], ['6', '4']], 2: [['20', '2']]},
'Italy': {1: [['1', '3'], ['2', '1'], ['3', '10'], ['3', '10'], ['4', '13']], 2: [['5', '17']]}}
# You can then produce the results in your intended format as below:
for country, recs in d.items():
for rec in recs.values():
i, j = zip(*rec)
print([country, '-'.join(i), '-'.join(j)])
['France', '5-5', '3-44']
['France', '9', '3']
['Italy', '1-2-3-3-4', '3-1-10-10-13']
['Italy', '5', '17']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With