Is there a function that searches a sequence of elements for a subsequence? I am looking for an analogue of StringPosition
for List
s. In my current application I am working with integer lists, but I'd be interested in a general FindSequence[list, pattern, n]
function which will find the first n
occurrences of pattern
in list
.
Here's a toy example:
Generate some data:
In[1]:= $HistoryLength = 0
Out[1]= 0
In[2]:= Timing[digits = First[RealDigits[\[Pi], 2, 10000000]];]
Out[2]= {26.5, Null}
Let's convert it to a string so we can compare to StringPosition
. This is very slow an memory hungry. (The memory is freed when the evaluation finishes.)
In[3]:= Timing[str = StringJoin[ToString /@ digits];]
Out[3]= {43.813, Null}
I am looking for this subsequence:
In[4]:= patt = {1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 1, 1};
In[5]:= strpatt = StringJoin[ToString /@ patt];
Searching the string is very fast:
In[6]:= StringPosition[str, strpatt] // Timing
Out[6]= {1.047, {{5737922, 5737943}}}
This is a simple implementation of searching for numerical arrays. It's slower than StringPosition
:
In[7]:= Timing[
corr = ListCorrelate[patt, digits];
Select[Flatten@Position[corr, patt.patt],
digits[[# ;; # + Length[patt] - 1]] === patt &]
]
Out[7]= {2.234, {5737922}}
Summary:
{1,0,1}
, or general ones like {1,_,1}
, though these latter ones may introduce complications.)I expect this will have many solutions, some fast, some more elegant, some more general :-)
Related questions:
Interesting reading:
EDIT:
I just found the undocumented LongestCommonSubsequencePositions
. LongestCommonSubsequencePositions[a, b]
will find the longest common subsequence of the lists a
and b
, and return position of its first occurrence only in both a
and b
. (The documented LongestCommonSubsequence
, which I was not aware of, will only return the subsequence itself, not its position.)
It is slower than the alternatives above, but it works on general lists that can contain any expression.
In[57]:= LongestCommonSubsequencePositions[digits, patt] // Timing
Out[57]= {5.25, {{5737922, 5737943}, {1, 22}}}
You can use ReplaceList
with a "prefix" and "suffix" of ___
and match the whole list. This gives you all the replacements that can be made (as opposed to Replace
). The position of your pattern is then simply the length of the prefix + 1. It's pretty fast as well:
In[40]:= Timing[ReplaceList[digits, Join[{pre___}, patt, {___}] :> Length[{pre}]
+ 1]]
Out[40]= {1.3059, {5737922}}
Edit: figured it's slightly more elegant to use a delayed rule than to map Length
afterwards.
Please have a look at functions seqPos
(general lists) and seqposC
(integer lists, compiled), which do exactly what you ask for, and are fast. I used them in this answer (for the question you actually linked to).
Here are the timing results for various solutions:
In[15]:= seqPos[digits, patt] // Timing
Out[15]= {1.297, {5737922}}
In[16]:= seqposC[digits, patt] // Timing
Out[16]= {0.125, {5737922}}
In[17]:=
Timing[corr = ListCorrelate[patt, digits];
Select[Flatten@Position[corr, patt.patt],
digits[[# ;; # + Length[patt] - 1]] === patt &]]
Out[17]= {0.844, {5737922}}
In[18]:= Timing[
ReplaceList[digits, Join[{pre__}, patt, {___}] :> Length[{pre}] + 1]]
Out[18]= {0.953, {5737922}}
In[19]:= AbsoluteTiming[cf[digits, patt]]
Out[19]= {3.1914063, 5737922}
These indicate that your approach with ListCorrelate
is not bad at all. My first function seqPos
(it is actually due to Norbert Pozar) is a bit slower but then it is completely general, while seqposC
is much faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With