Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group list of ints by continuous sequence

Tags:

python

I have a list of integers...

[1,2,3,4,5,8,9,10,11,200,201,202]

I would like to group them into a list of lists where each sublist contains integers whose sequence has not been broken. Like this...

[[1,5],[8,11],[200,202]]

I have a rather clunky work around...

lSequenceOfNum = [1,2,3,4,5,8,9,10,11,200,201,202]

lGrouped = []
start = 0
for x in range(0,len(lSequenceOfNum)):
    if x != len(lSequenceOfNum)-1:
        if(lSequenceOfNum[x+1] - lSequenceOfNum[x]) > 1:
            lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
            start = x+1

    else:
        lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
print lGrouped

It is the best I could do. Is there a more "pythonic" way to do this? Thanks..

like image 731
b10hazard Avatar asked May 02 '12 19:05

b10hazard


4 Answers

Assuming the list will always be in ascending order:

from itertools import groupby, count

numberlist = [1,2,3,4,5,8,9,10,11,200,201,202]

def as_range(g):
    l = list(g)
    return l[0], l[-1]

print [as_range(g) for _, g in groupby(numberlist, key=lambda n, c=count(): n-next(c))]
like image 87
Jeff Mercado Avatar answered Nov 16 '22 13:11

Jeff Mercado


I realised I had overcomplicated this a little, far easier to just count manually than use a slightly convoluted generator:

def ranges(seq):
    start, end = seq[0], seq[0]
    count = start
    for item in seq:
        if not count == item:
            yield start, end
            start, end = item, item
            count = item
        end = item
        count += 1
    yield start, end

print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))

Producing:

[(1, 5), (8, 11), (200, 202)]

This method is pretty fast:

This method (and the old one, they perform almost exactly the same):

python -m timeit -s "from test import ranges" "ranges([1,2,3,4,5,8,9,10,11,200,201,202])"
1000000 loops, best of 3: 0.47 usec per loop

Jeff Mercado's Method:

python -m timeit -s "from test import as_range; from itertools import groupby, count" "[as_range(g) for _, g in groupby([1,2,3,4,5,8,9,10,11,200,201,202], key=lambda n, c=count(): n-next(c))]"
100000 loops, best of 3: 11.1 usec per loop

That's over 20x faster - although, naturally, unless speed matters this isn't a real concern.


My old solution using generators:

import itertools

def resetable_counter(start):
    while True:
        for i in itertools.count(start):
            reset = yield i
            if reset:
                start = reset
                break

def ranges(seq):
    start, end = seq[0], seq[0]
    counter = resetable_counter(start)
    for count, item in zip(counter, seq): #In 2.x: itertools.izip(counter, seq)
        if not count == item:
            yield start, end
            start, end = item, item
            counter.send(item)
        end = item
    yield start, end

print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))

Producing:

[(1, 5), (8, 11), (200, 202)]
like image 44
Gareth Latty Avatar answered Nov 16 '22 15:11

Gareth Latty


You can do this efficiently in three steps

given

list1=[1,2,3,4,5,8,9,10,11,200,201,202]

Calculate the discontinuity

     [1,2,3,4,5,8,9,10,11 ,200,201,202]
-      [1,2,3,4,5,8,9 ,10 ,11 ,200,201,202]
----------------------------------------
       [1,1,1,1,3,1,1 ,1  ,189,1  ,1]
(index) 1 2 3 4 5 6 7  8   9   10  11 
                *          *
rng = [i+1 for i,e in enumerate((x-y for x,y in zip(list1[1:],list1))) if e!=1] 
>>> rng
[5, 9]

Add the boundaries

rng = [0] + rng + [len(list1)]
>>> rng
[0, 5, 9,12]

now calculate the actual continuity ranges

[(list1[i],list1[j-1]) for i,j in zip(list2,list2[1:])]
[(1, 5), (8, 11), (200, 202)]

LB                [0,   5,    9,  12]
UB             [0, 5,   9,    12]
     -----------------------
indexes (LB,UB-1) (0,4) (5,8) (9,11)
like image 2
Abhijit Avatar answered Nov 16 '22 14:11

Abhijit


The question is quite old, but I thought I'll share my solution anyway

Assuming import numpy as np

a = [1,2,3,4,5,8,9,10,11,200,201,202]

np.split(a, array(np.add(np.where(diff(a)>1),1)).tolist()[0])
like image 1
Aguy Avatar answered Nov 16 '22 13:11

Aguy