I have a list of integers...
[1,2,3,4,5,8,9,10,11,200,201,202]
I would like to group them into a list of lists where each sublist contains integers whose sequence has not been broken. Like this...
[[1,5],[8,11],[200,202]]
I have a rather clunky work around...
lSequenceOfNum = [1,2,3,4,5,8,9,10,11,200,201,202]
lGrouped = []
start = 0
for x in range(0,len(lSequenceOfNum)):
if x != len(lSequenceOfNum)-1:
if(lSequenceOfNum[x+1] - lSequenceOfNum[x]) > 1:
lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
start = x+1
else:
lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
print lGrouped
It is the best I could do. Is there a more "pythonic" way to do this? Thanks..
Assuming the list will always be in ascending order:
from itertools import groupby, count
numberlist = [1,2,3,4,5,8,9,10,11,200,201,202]
def as_range(g):
l = list(g)
return l[0], l[-1]
print [as_range(g) for _, g in groupby(numberlist, key=lambda n, c=count(): n-next(c))]
I realised I had overcomplicated this a little, far easier to just count manually than use a slightly convoluted generator:
def ranges(seq):
start, end = seq[0], seq[0]
count = start
for item in seq:
if not count == item:
yield start, end
start, end = item, item
count = item
end = item
count += 1
yield start, end
print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))
Producing:
[(1, 5), (8, 11), (200, 202)]
This method is pretty fast:
This method (and the old one, they perform almost exactly the same):
python -m timeit -s "from test import ranges" "ranges([1,2,3,4,5,8,9,10,11,200,201,202])"
1000000 loops, best of 3: 0.47 usec per loop
Jeff Mercado's Method:
python -m timeit -s "from test import as_range; from itertools import groupby, count" "[as_range(g) for _, g in groupby([1,2,3,4,5,8,9,10,11,200,201,202], key=lambda n, c=count(): n-next(c))]"
100000 loops, best of 3: 11.1 usec per loop
That's over 20x faster - although, naturally, unless speed matters this isn't a real concern.
My old solution using generators:
import itertools
def resetable_counter(start):
while True:
for i in itertools.count(start):
reset = yield i
if reset:
start = reset
break
def ranges(seq):
start, end = seq[0], seq[0]
counter = resetable_counter(start)
for count, item in zip(counter, seq): #In 2.x: itertools.izip(counter, seq)
if not count == item:
yield start, end
start, end = item, item
counter.send(item)
end = item
yield start, end
print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))
Producing:
[(1, 5), (8, 11), (200, 202)]
You can do this efficiently in three steps
given
list1=[1,2,3,4,5,8,9,10,11,200,201,202]
Calculate the discontinuity
[1,2,3,4,5,8,9,10,11 ,200,201,202]
- [1,2,3,4,5,8,9 ,10 ,11 ,200,201,202]
----------------------------------------
[1,1,1,1,3,1,1 ,1 ,189,1 ,1]
(index) 1 2 3 4 5 6 7 8 9 10 11
* *
rng = [i+1 for i,e in enumerate((x-y for x,y in zip(list1[1:],list1))) if e!=1]
>>> rng
[5, 9]
Add the boundaries
rng = [0] + rng + [len(list1)]
>>> rng
[0, 5, 9,12]
now calculate the actual continuity ranges
[(list1[i],list1[j-1]) for i,j in zip(list2,list2[1:])]
[(1, 5), (8, 11), (200, 202)]
LB [0, 5, 9, 12]
UB [0, 5, 9, 12]
-----------------------
indexes (LB,UB-1) (0,4) (5,8) (9,11)
The question is quite old, but I thought I'll share my solution anyway
Assuming import numpy as np
a = [1,2,3,4,5,8,9,10,11,200,201,202]
np.split(a, array(np.add(np.where(diff(a)>1),1)).tolist()[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With