I have a sorted list of integers and I find want find to the number runs in this list. I have seen many examples when looking for numbers runs that increment by 1, but I also want to look for number runs where the difference between numbers is customizable.
For example, say I have the following list of numbers:
nums = [1, 2, 3, 6, 7, 8, 10, 12, 14, 18, 25, 28, 31, 39]
Using the example found here, I'm able to find the following number runs:
[[1, 2, 3], [6, 7, 8], [10], [12], [14], [18], [25], [28], [31], [39]]
However, I want to look for number runs where the difference between two numbers could be more than just 1. For example, I want all number runs with a distance of less than or equal to 2.
[[1, 2, 3], [6, 7, 8, 10, 12, 14], [18], [25], [28], [31], [39]]
Or maybe I want all number runs with a distance of less than or equal to 3.
[[1, 2, 3, 6, 7, 8, 10, 12, 14], [18], [25, 28, 31], [39]]
Here is the function that I'm working with now to get number runs with a distance of 1.
def runs(seq, n):
result = []
for s in seq:
if not result or s != result[-1][-1] + n:
# Start a new run if we can't continue the previous one.
result.append([])
result[-1].append(s)
return result
With the current function, if I set n=1
, then I find all consecutive number sequences. If I set n=2
, then I only find [8, 10, 12, 14]
. How can I modify this function to find number runs that are less than or equal to n
?
I want to be able to do this:
runs(num, 2)
[[1, 2, 3], [6, 7, 8, 10, 12, 14], [18], [25], [28], [31], [39]]
for
I'm just fixing your code with this one. To simplify things, you can initialise result
with seq[0]
.
def runs(seq, n):
result = [[seq[0]]]
for s in seq[1:]:
if s - result[-1][-1] > n: # Keep it simple. Compare the delta.
result.append([])
result[-1].append(s)
return result
>>> runs(nums, 1)
[[1, 2, 3], [6, 7, 8], [10], [12], [14], [18], [25], [28], [31], [39]]
>>> runs(nums, 2)
[[1, 2, 3], [6, 7, 8, 10, 12, 14], [18], [25], [28], [31], [39]]
pandas.GroupBy
If you want to get fancy, you can use the groupby
idiom, made easy with pandas.
import pandas as pd
def runs2(seq, n):
s = pd.Series(seq)
return s.groupby(s.diff().gt(n).cumsum()).apply(pd.Series.tolist).tolist()
>>> runs2(nums, 3)
[[1, 2, 3, 6, 7, 8, 10, 12, 14], [18], [25, 28, 31], [39]]
There are two essential ingredients here: the grouper (the predicate you're grouping on), and the agg function (the function you'll apply on each group)
The grouper is s.diff().gt(n).cumsum()
, broken down calculates three things:
seq
using diff
n
The output of this operation is
s.diff().gt(n).cumsum()
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 2
10 3
11 4
12 5
13 6
dtype: int64
The agg function is pd.Series.tolist
, and will convert any series to a list. That's what we need here, a nested list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With