I have a list of numbers. I want to extract from the list runs of numbers that fall inside some band and have some minimum length. For instance, suppose the I want to operate on this list:
thisList = {-1.2, -1.8, 1.5, -0.6, -0.8, -0.1, 1.4, -0.3, -0.1, -0.7}
with band=1
and runLength=3
. I would like to have
{{-0.6, -0.8, -0.1}, {-0.3, -0.1, -0.7}}
as the result. Right now I'm using
Cases[
Partition[thisList,runLength,1],
x_ /; Abs[Max[x] - Min[x]] < band
]
The main problem is that where runs overlap, I get many copies of the run. For instance, using
thisList = {-1.2, -1.8, 1.5, -0.6, -0.8, -0.1, -0.5, -0.3, -0.1, -0.7}
gives me
{{-0.6, -0.8, -0.1}, {-0.8, -0.1, -0.5}, {-0.1, -0.5, -0.3}, {-0.5, -0.3, -0.1}, {-0.3, -0.1, -0.7}}
where I would rather have
{-0.6, -0.8, -0.1, -0.5, -0.3, -0.1, -0.7}
without doing some hokey reduction of the overlapping result. What's the proper way? It'd be nice if it didn't involve exploding the data using Partition
.
EDIT
Apparenty, my first solution has at least two serious flaws: it is dead slow and completely impractical for lists larger than 100 elements, and it contains some bug(s) which I wasn't able to identify yet - it is missing some bands sometimes. So, I will provide two (hopefuly correct) and much more efficient alternatives, and I provide the flawed one below for any one interested.
Here is a solution based on linked lists. It allows us to still use patterns but avoid inefficiencies caused by patterns containing __
or ___
(when repeatedly applied):
ClearAll[toLinkedList];
toLinkedList[x_List] := Fold[{#2, #1} &, {}, Reverse@x]
ClearAll[accumF];
accumF[llFull_List, acc_List, {h_, t_List}, ctr_, max_, min_, band_, rLen_] :=
With[{cmax = Max[max, h], cmin = Min[min, h]},
accumF[llFull, {acc, h}, t, ctr + 1, cmax, cmin, band, rLen] /;
Abs[cmax - cmin] < band];
accumF[llFull_List, acc_List, ll : {h_, _List}, ctr_, _, _, band_, rLen_] /; ctr >= rLen :=
accumF[ll, (Sow[acc]; {}), ll, 0, h, h, band, rLen];
accumF[llFull : {h_, t : {_, _List}}, _List, ll : {head_, _List}, _, _, _, band_, rLen_] :=
accumF[t, {}, t, 0, First@t, First@t, band, rLen];
accumF[llFull_List, acc_List, {}, ctr_, _, _, _, rLen_] /; ctr >= rLen := Sow[acc];
ClearAll[getBandsLL];
getBandsLL[lst_List, runLength_Integer, band_?NumericQ] :=
Block[{$IterationLimit = Infinity},
With[{ll = toLinkedList@lst},
Map[Flatten,
If[# === {}, #, First@#] &@
Reap[
accumF[ll, {}, ll, 0, First@ll, First@ll, band,runLength]
][[2]]
]
]
];
Here are examples of use:
In[246]:= getBandsLL[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,1.4,-0.3,-0.1,-0.7},3,1]
Out[246]= {{-0.6,-0.8,-0.1},{-0.3,-0.1,-0.7}}
In[247]:= getBandsLL[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7},3,1]
Out[247]= {{-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7}}
The main idea of the function accumF
is to traverse the number list (converted to a linked list prior to that), and accumulate a band in another linked list, which is passed to it as a second argument. Once the band condition fails, the accumulated band is memorized using Sow
(if it was long enough), and the process starts over with the remaining part of the linked list. The ctr
parameter may not be needed if we would choose to use Depth[acc]
instead.
There are a few non-obvious things present in the above code. One subtle point is that an attempt to join the two middle rules for accumF
into a single rule (they look very similar) and use CompoundExpression
(something like (If[ctr>=rLen, Sow[acc];accumF[...])
) on the r.h.s. would lead to a non-tail-recursive accumF
(See this answer for a more detailed discussion of this issue. This is also why I make the (Sow[acc]; {})
line inside a function call - to avoid the top-level CompoundExpression
on the r.h.s.). Another subtle point is that I have to maintain a copy of the linked list containing the remaining elements right after the last successful match was found, since in the case of unsuccessful sequence I need to roll back to that list minus its first element, and start over. This linked list is stored in the first argument of accumF
.
Note that passing large linked lists does not cost much, since what is copied is only a first element (head) and a pointer to the rest (tail). This is the main reason why using linked lists vastly improves performance, as compared to the case of patterns like {___,x__,right___}
- because in the latter case, a full sequences x
or right
are copied. With linked lists, we effectively only copy a few references, and therefore our algorithms behave roughly as we expect (linearly in the length of the data list here). In this answer, I also mentioned the use of linked lists in such cases as one of the techniques to optimize code (section 3.4).
Here is a straightforward but not too elegant function based on Compile
, which finds a list of starting and ending bands positions in the list:
bandPositions =
Compile[{{lst, _Real, 1}, {runLength, _Integer}, {band, _Real}},
Module[{i = 1, j, currentMin, currentMax,
startEndPos = Table[{0, 0}, {Length[lst]}], ctr = 0},
For[i = 1, i <= Length[lst], i++,
currentMin = currentMax = lst[[i]];
For[j = i + 1, j <= Length[lst], j++,
If[lst[[j]] < currentMin,
currentMin = lst[[j]],
(* else *)
If[lst[[j]] > currentMax,
currentMax = lst[[j]]
]
];
If[Abs[currentMax - currentMin] >= band ,
If[ j - i >= runLength,
startEndPos[[++ctr]] = {i, j - 1}; i = j - 1
];
Break[],
(* else *)
If[j == Length[lst] && j - i >= runLength - 1,
startEndPos[[++ctr]] = {i, j}; i = Length[lst];
Break[];
];
]
]; (* inner For *)
]; (* outer For *)
Take[startEndPos, ctr]], CompilationTarget -> "C"];
This is used in the final function:
getBandsC[lst_List, runLength_Integer, band_?NumericQ] :=
Map[Take[lst, #] &, bandPositions[lst, runLength, band]]
Examples of use:
In[305]:= getBandsC[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,1.4,-0.3,-0.1,-0.7},3,1]
Out[305]= {{-0.6,-0.8,-0.1},{-0.3,-0.1,-0.7}}
In[306]:= getBandsC[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7},3,1]
Out[306]= {{-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7}}
In[381]:=
largeTest = RandomReal[{-5,5},50000];
(res1 =getBandsLL[largeTest,3,1]);//Timing
(res2 =getBandsC[largeTest,3,1]);//Timing
res1==res2
Out[382]= {1.109,Null}
Out[383]= {0.016,Null}
Out[384]= True
Obviously, if one wants performance, Compile
wins hands down. My observations for larger lists are that both solutions have approximately linear complexity with the size of the number list (as they should), with compiled one roughly 150 times faster on my machine than the one based on linked lists.
In fact, both methods encode the same algorithm, although this may not be obvious. The one with recursion and patterns is arguably somewhat more understandable, but that is a matter of opinion.
Here is the original code that I wrote first to solve this problem. This is based on a rather straightforward use of patterns and repeated rule application. As mentioned, one disadvantage of this method is its very bad performance. This is actually another case against using constructs like {___,x__,y___}
in conjunction with repeated rule application, for anything longer than a few dozens elements. In the mentioned recommendations for code optimization techniques, this corresponds to the section 4.1.
Anyways, here is the code:
If[# === {}, #, First@#] &@
Reap[thisList //. {
left___,
Longest[x__] /;Length[{x}] >= runLength && Abs[Max[{x}] - Min[{x}]] < band,
right___} :> (Sow[{x}]; {right})][[2]]
It works correctly for both of the original small test lists. It also looks generally correct, but for larger lists it often misses some bands, which can be seen by comparison with the other two methods. I wasn't so far able to localize the bug, since the code seems pretty transparent.
I'd try this instead:
thisList /. {___, Longest[a : Repeated[_, {3, Infinity}]], ___} :>
{a} /; Abs[Max@{a} - Min@{a}] < 1
where Repeated[_, {3, Infinity}]
guarantees that you get at least 3 terms, and Longest
ensures it gives you the longest run possible. As a function,
Clear[f]
f[list_List, band_, minlen_Integer?Positive] := f[list, band, minlen, Infinity]
f[list_List, band_,
minlen_Integer?Positive, maxlen_?Positive] /; maxlen >= minlen :=
list /. {___, Longest[a : Repeated[_, {minlen, maxlen}]], ___} :> {a} /;
Abs[Max@{a} - Min@{a}] < band
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With