Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to find the period of a (repeating) list in Mathematica?

What is the best way to find the period in a repeating list?

For example:

a = {4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2}

has repeat {4, 5, 1, 2, 3} with the remainder {4, 5, 1, 2} matching, but being incomplete.

The algorithm should be fast enough to handle longer cases, like so:

b = RandomInteger[10000, {100}];
a = Join[b, b, b, b, Take[b, 27]]

The algorithm should return $Failed if there is no repeating pattern like above.

like image 260
Arnoud Buzing Avatar asked Nov 18 '11 00:11

Arnoud Buzing


4 Answers

Please see the comments interspersed with the code on how it works.

(* True if a has period p *)
testPeriod[p_, a_] := Drop[a, p] === Drop[a, -p]

(* are all the list elements the same? *)
homogeneousQ[list_List] := Length@Tally[list] === 1
homogeneousQ[{}] := Throw[$Failed] (* yes, it's ugly to put this here ... *)

(* auxiliary for findPeriodOfFirstElement[] *)
reduce[a_] := Differences@Flatten@Position[a, First[a], {1}]

(* the first element occurs every ?th position ? *)
findPeriodOfFirstElement[a_] := Module[{nl},
  nl = NestWhileList[reduce, reduce[a], ! homogeneousQ[#] &];
  Fold[Total@Take[#2, #1] &, 1, Reverse[nl]]
  ]

(* the period must be a multiple of the period of the first element *)
period[a_] := Catch@With[{fp = findPeriodOfFirstElement[a]},
   Do[
    If[testPeriod[p, a], Return[p]],
    {p, fp, Quotient[Length[a], 2], fp}
    ]
   ]

Please ask if findPeriodOfFirstElement[] is not clear. I did this independently (for fun!), but now I see that the principle is the same as in Verbeia's solution, except the problem pointed out by Brett is fixed.

I was testing with

b = RandomInteger[100, {1000}];
a = Flatten[{ConstantArray[b, 1000], Take[b, 27]}];

(Note the low integer values: there will be lots of repeating elements within the same period *)


EDIT: According to Leonid's comment below, another 2-3x speedup (~2.4x on my machine) is possible by using a custom position function, compiled specifically for lists of integers:

(* Leonid's reduce[] *)

myPosition = Compile[
  {{lst, _Integer, 1}, {val, _Integer}}, 
  Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0}, 
    For[i = 1, i <= Length[lst], i++, 
      If[lst[[i]] == val, pos[[++ctr]] = i]
    ]; 
    Take[pos, ctr]
  ], 
  CompilationTarget -> "C", RuntimeOptions -> "Speed"
]

reduce[a_] := Differences@myPosition[a, First[a]]

Compiling testPeriod gives a further ~20% speedup in a quick test, but I believe this will depend on the input data:

Clear[testPeriod]
testPeriod = 
 Compile[{{p, _Integer}, {a, _Integer, 1}}, 
  Drop[a, p] === Drop[a, -p]]
like image 185
Szabolcs Avatar answered Nov 18 '22 12:11

Szabolcs


Above methods are better if you have no noise. If your signal is only approximate then Fourier transform methods might be useful. I'll illustrate with a "parametrized" setup wherein the length and number of repetitions of the base signal, the length of the trailing part, and a bound on the noise perturbation are all variables one can play with.

noise = 20;
extra = 40;
baselen = 103;
base = RandomInteger[10000, {baselen}];
repeat = 5;
signal = Flatten[Join[ConstantArray[base, repeat], Take[base, extra]]];
noisysignal = signal + RandomInteger[{-noise, noise}, Length[signal]];

We compute the absolute value of the FFT. We adjoin zeros to both ends. The object will be to threshold by comparing to neighbors.

sigfft = Join[{0.}, Abs[Fourier[noisysignal]], {0}];

Now we create two 0-1 vectors. In one we threshold by making a 1 for each element in the fft that is greater than twice the geometric mean of its two neighbors. In the other we use the average (arithmetic mean) but we lower the size bound to 3/4. This was based on some experimentation. We count the number of 1s in each case. Ideally we'd get 100 for each, as that would be the number of nonzeros in a "perfect" case of no noise and no tail part.

In[419]:= 
thresh1 = 
  Table[If[sigfft[[j]]^2 > 2*sigfft[[j - 1]]*sigfft[[j + 1]], 1, 
    0], {j, 2, Length[sigfft] - 1}];
count1 = Count[thresh1, 1]
thresh2 = 
  Table[If[sigfft[[j]] > 3/4*(sigfft[[j - 1]] + sigfft[[j + 1]]), 1, 
    0], {j, 2, Length[sigfft] - 1}];
count2 = Count[thresh2, 1]

Out[420]= 114

Out[422]= 100

Now we get our best guess as to the value of "repeats", by taking the floor of the total length over the average of our counts.

approxrepeats = Floor[2*Length[signal]/(count1 + count2)]
Out[423]= 5

So we have found that the basic signal is repeated 5 times. That can give a start toward refining to estimate the correct length (baselen, above). To that end we might try removing elements at the end and seeing when we get ffts closer to actually having runs of four 0s between nonzero values.

Something else that might work for estimating number of repeats is finding the modal number of zeros in run length encoding of the thresholded ffts. While I have not actually tried that, it looks like it might be robust to bad choices in the details of how one does the thresholding (mine were just experiments that seem to work).

Daniel Lichtblau

like image 7
Daniel Lichtblau Avatar answered Nov 18 '22 12:11

Daniel Lichtblau


The following assumes that the cycle starts on the first element and gives the period length and the cycle.

findCyclingList[a_?VectorQ] :=
  Module[{repeats1, repeats2, cl, cLs, vec}, 
  repeats1 = Flatten@Differences[Position[a, First[a]]];
  repeats2 = Flatten[Position[repeats1, First[repeats1]]]; 
  If[Equal @@ Differences[repeats2] && Length[repeats2] > 2(* 
   is potentially cyclic - first element appears cyclically *),
   cl = Plus @@@ Partition[repeats1, First[Differences[repeats2]]];
   cLs = Partition[a, First[cl]];
   If[SameQ @@ cLs  (* candidate cycles all actually the same *), 
    vec = First[cLs];
    {Length[vec], vec}, $Failed], $Failed]  ]

Testing

b = RandomInteger[50, {100}];
a = Join[b, b, b, b, Take[b, 27]];

findCyclingList[a]

{100, {47, 15, 42, 10, 14, 29, 12, 29, 11, 37, 6, 19, 14, 50, 4, 38, 
  23, 3, 41, 39, 41, 17, 32, 8, 18, 37, 5, 45, 38, 8, 39, 9, 26, 33, 
  40, 50, 0, 45, 1, 48, 32, 37, 15, 37, 49, 16, 27, 36, 11, 16, 4, 28,
   31, 46, 30, 24, 30, 3, 32, 31, 31, 0, 32, 35, 47, 44, 7, 21, 1, 22,
   43, 13, 44, 35, 29, 38, 31, 31, 17, 37, 49, 22, 15, 28, 21, 8, 31, 
  42, 26, 33, 1, 47, 26, 1, 37, 22, 40, 27, 27, 16}}

b1 = RandomInteger[10000, {100}]; 
a1 = Join[b1, b1, b1, b1, Take[b1, 23]];

findCyclingList[a1]

{100, {1281, 5325, 8435, 7505, 1355, 857, 2597, 8807, 1095, 4203, 
  3718, 3501, 7054, 4620, 6359, 1624, 6115, 8567, 4030, 5029, 6515, 
  5921, 4875, 2677, 6776, 2468, 7983, 4750, 7609, 9471, 1328, 7830, 
  2241, 4859, 9289, 6294, 7259, 4693, 7188, 2038, 3994, 1907, 2389, 
  6622, 4758, 3171, 1746, 2254, 556, 3010, 1814, 4782, 3849, 6695, 
  4316, 1548, 3824, 5094, 8161, 8423, 8765, 1134, 7442, 8218, 5429, 
  7255, 4131, 9474, 6016, 2438, 403, 6783, 4217, 7452, 2418, 9744, 
  6405, 8757, 9666, 4035, 7833, 2657, 7432, 3066, 9081, 9523, 3284, 
  3661, 1947, 3619, 2550, 4950, 1537, 2772, 5432, 6517, 6142, 9774, 
  1289, 6352}}

This case should fail because it isn't cyclical.

findCyclingList[Join[b, Take[b, 11], b]]

$Failed

I tried to something with Repeated, e.g. a /. Repeated[t__, {2, 100}] -> {t} but it just doesn't work for me.

like image 5
Verbeia Avatar answered Nov 18 '22 13:11

Verbeia


Does this work for you?

period[a_] := 
   Quiet[Check[
      First[Cases[
         Table[
            {k, Equal @@ Partition[a, k]}, 
            {k, Floor[Length[a]/2]}], 
         {k_, True} :> k
         ]], 
      $Failed]]

Strictly speaking, this will fail for things like

a = {1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5}

although this can be fixed by using something like:

(Equal @@ Partition[a, k]) && (Equal @@ Partition[Reverse[a], k])

(probably computing Reverse[a] just once ahead of time.)

like image 3
Brett Champion Avatar answered Nov 18 '22 11:11

Brett Champion