A few weeks ago Lembik asked the following question:
A period
p
of a stringw
is any positive integerp
such thatw[i]=w[i+p]
whenever both sides of this equation are defined. Letper(w)
denote the size of the smallest period ofw
. We say that a stringw
is periodic iffper(w) <= |w|/2
.
So informally a periodic string is just a string that is made up from another string repeated at least once. The only complication is that at the end of the string we don't require a full copy of the repeated string as long as it is repeated in its entirety at least once.
For, example consider the string x = abcab
. per(abcab) = 3
as x[1] = x[1+3] = a
, x[2]=x[2+3] = b
and there is no smaller period. The string abcab
is therefore not periodic. However, the string ababa
is periodic as per(ababa) = 2
.
As more examples, abcabca
, ababababa
and abcabcabc
are also periodic.
For those who like regexes, this one detects if a string is periodic or not:
\b(\w*)(\w+\1)\2+\b
The task is to find all maximal periodic substrings in a longer string. These are sometimes called runs in the literature.
A substring
w[i,j]
ofw
is a maximal periodic substring (run) if it is periodic and neitherw[i-1] = w[i-1+p]
norw[j+1] = w[j+1-p]
. Informally, the "run" cannot be contained in a larger "run" with the same period.
Because two runs can represent the same string of characters occurring in different places in the overall string, we will represent runs by intervals. Here is the above definition repeated in terms of intervals.
A run (or maximal periodic substring) in a string
T
is an interval[i...j]
withj>=i
, such that
T[i...j]
is a periodic word with the periodp = per(T[i...j])
- It is maximal. Formally, neither
T[i-1] = T[i-1+p]
norT[j+1] = T[j+1-p]
. Informally, the run cannot be contained in a larger run with the same period.
Denote by RUNS(T)
the set of runs in string T
.
Examples of runs
The four maximal periodic substrings (runs) in string T = atattatt
are T[4,5] = tt
, T[7,8] = tt
, T[1,4] = atat
, T[2,8] = tattatt
.
The string T = aabaabaaaacaacac
contains the following 7 maximal periodic substrings (runs):
T[1,2] = aa
, T[4,5] = aa
, T[7,10] = aaaa
, T[12,13] = aa
, T[13,16] = acac
, T[1,8] = aabaabaa
, T[9,15] = aacaaca
.
The string T = atatbatatb
contains the following three runs. They are:
T[1, 4] = atat
, T[6, 9] = atat
and T[1, 10] = atatbatatb
.
Here I am using 1-indexing.
The goal
Write code so that for each integer n starting at 2, you output the largest numbers of runs contained in any binary string of length n
.
Example optima
In the following: n, optimum number of runs, example string
.
2 1 00
3 1 000
4 2 0011
5 2 00011
6 3 001001
7 4 0010011
8 5 00110011
9 5 000110011
10 6 0010011001
11 7 00100110011
12 8 001001100100
13 8 0001001100100
14 10 00100110010011
15 10 000100110010011
16 11 0010011001001100
17 12 00100101101001011
18 13 001001100100110011
19 14 0010011001001100100
Is there a faster way to find the optimum number of runs for increasing values of
n
than a naive O(n^2 2^n) time approach?
A generational algorithm to find all solutions
The Idea
In every string the last character can only contribute to a limited number of runs.
A last 0 can only add a run to
10 + 0 => 100
since in
00 + 0 => 000
it's only a repetition. Anf if it adds that minimal run, the next possible minimal run to add is
110010 + 0 => 1100100
note again
010010 + 0 => 0100100
is not an additional run, it's a repetition. The next possible adds are
111001001100100
1111001001100100111001001100100
...
The digits can vary but the minimal lengths are
3, 7, 15, 31
which is
4^1 - 1, 4^2 - 1, ..., 4^n - 1
At string start there is no need for a different character, thus
maxaddlast = 4^n - 2
yields the maximum number of runs which can be added with by adding the last character.
The Algorithm
The Seed
The remaining problem is to size the stack to collect all variants which are required for future seeds.
Since there is not enough data to guess a valid formula an adaptive algorithm is choosen:
The Result
length 104 with 91 runs
is reached within 600 seconds. Then the memory is used up with default settings. Use -Xmx16G or more. For larger numbers the code must be modified to maintain the seed on disk, not in memory.
And it's lot faster than the brute force method.
** The Code **
And here is my example code in Java:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.ArrayList;
import de.bb.util.Pair;
/**
* A search algorithm to find all runs for increasing lengths of strings of 0s
* and 1s.
*
* This algorithm uses a seed to generate the candidates for the next search.
* The seed contains the solutions for rho(n), rho(n) - 1, ..., minstart(n).
* Since the seed size is unknown, it starts with a minimal seed: minstart(n) =
* rho(n) - 1; After the solutions are calculated the all seeds are checked. If
* a seed with minstart(n) was used, that minstart(n) gets decremented and the
* search is restarted at position n + 1. This guarantees that the seed is
* always large enough.
*
* Optional TODO: Since the seed can occupy large amounts of memory, the seed is
* maintained on disk.
*
* @author Stefan "Bebbo" Franke (c) 2016
*/
public class MaxNumberOfRunsAdaptive {
private static long start;
private ArrayList<Pair<byte[], ArrayList<Integer>>> seed = new ArrayList<>();
private int max;
private ArrayList<ArrayList<Pair<byte[], ArrayList<Integer>>>> nextSeedStack;
private ArrayList<Integer> maxs = new ArrayList<>();
private ArrayList<Integer> diffs = new ArrayList<>();
private ArrayList<Integer> totals = new ArrayList<>();
private int total;
private byte[] buffer;
public static void main(String[] args) {
int limit = 9999;
if (args.length == 1) {
try {
limit = Integer.parseInt(args[0]);
} catch (Exception e) {
}
}
start = System.currentTimeMillis();
new MaxNumberOfRunsAdaptive().run(limit);
long took = (System.currentTimeMillis() - start) / 100;
System.out.println("took " + (took / 10.) + "s");
}
/**
* Find a string with the max number of runs for all lengths from 2 to
* limit;
*
* @param limit
* the limit to stop calculation.
*/
private void run(int limit) {
maxs.add(0);
maxs.add(0);
diffs.add(0);
diffs.add(1);
totals.add(0);
totals.add(0);
ArrayList<Integer> n0 = new ArrayList<Integer>();
n0.add(0);
seed.add(Pair.makePair(new byte[] { '0' }, n0));
saveSeed(2);
for (int i = 2; i <= limit;) {
int restart = compose(i);
if (restart < i) {
System.out.println("*** restarting at: " + restart + " ***");
i = restart;
loadSeed(i);
total = totals.get(i - 1);
} else {
saveSeed(i + 1);
++i;
}
}
}
/**
* Load the seed for the length from disk.
*
* @param length
*/
private void loadSeed(int length) {
try {
seed.clear();
final FileReader fr = new FileReader("seed-" + length + ".txt");
final BufferedReader br = new BufferedReader(fr);
for (String line = br.readLine(); line != null; line = br.readLine()) {
final int space = line.indexOf(' ');
final byte[] b = line.substring(0, space).getBytes();
final String sends = line.substring(space + 2, line.length() - 1);
final ArrayList<Integer> ends = new ArrayList<>();
for (final String s : sends.split(",")) {
ends.add(Integer.parseInt(s.trim()));
}
seed.add(Pair.makePair(b, ends));
}
fr.close();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* Save the seed for the given length to the disk.
*
* @param length
* the length
*/
private void saveSeed(int length) {
try {
final FileWriter fos = new FileWriter("seed-" + length + ".txt");
for (final Pair<byte[], ArrayList<Integer>> p : seed) {
fos.write(new String(p.getFirst()) + " " + p.getSecond().toString() + "\n");
}
fos.close();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* Compose new strings from all available bases. Also collect the candidates
* for the next base.
*/
private int compose(int length) {
max = 0;
int nextStackSize;
if (diffs.size() > length)
nextStackSize = diffs.get(length) + 1;
else
nextStackSize = diffs.get(length - 1) - 1;
if (nextStackSize < 2)
nextStackSize = 2;
// setup collector for next bases
nextSeedStack = new ArrayList<>();
for (int i = 0; i < nextStackSize; ++i) {
nextSeedStack.add(new ArrayList<Pair<byte[], ArrayList<Integer>>>());
}
buffer = new byte[length];
// extend the bases
for (Pair<byte[], ArrayList<Integer>> e : seed) {
final byte[] s = e.getFirst();
System.arraycopy(s, 0, buffer, 0, length - 1);
if (s.length < 3 || s[s.length - 1] == '1' || s[s.length - 2] == '1' || s[s.length - 3] == '1') {
buffer[length - 1] = '0';
test(length, e.getSecond());
}
if (s.length < 3 || s[s.length - 1] == '0' || s[s.length - 2] == '0' || s[s.length - 3] == '0') {
buffer[length - 1] = '1';
test(length, e.getSecond());
}
}
long took = (System.currentTimeMillis() - start) / 100;
final ArrayList<String> solutions = new ArrayList<String>();
for (Pair<byte[], ArrayList<Integer>> p : nextSeedStack.get(nextSeedStack.size() - 1)) {
solutions.add(new String(p.getFirst()));
}
total += solutions.size();
if (totals.size() <= length)
totals.add(0);
totals.set(length, total);
if (maxs.size() <= length) {
maxs.add(0);
}
maxs.set(length, max);
System.out.println(length + " " + max + " " + (took / 10.) + " " + total + " " + solutions);
seed.clear();
// setup base for next level
for (ArrayList<Pair<byte[], ArrayList<Integer>>> t : nextSeedStack) {
seed.addAll(t);
}
if (diffs.size() <= length) {
diffs.add(1);
}
int restart = length;
// check for restart
for (final String b : solutions) {
for (int i = 2; i < b.length(); ++i) {
int diff = maxs.get(i) - countRuns(b.substring(0, i));
if (diff >= diffs.get(i)) {
if (i < restart)
restart = i;
diffs.set(i, diff + 1);
}
}
}
System.out.println(diffs);
return restart;
}
/**
* Test the current buffer and at it to the next seed stack,
*
* @param l
* the current length
* @param endRuns
* the end runs to store
*/
void test(final int l, final ArrayList<Integer> endRuns) {
final ArrayList<Integer> r = incrementalCountRuns(l, endRuns);
final int n = r.get(r.size() - 1);
// shift the nextBaseStack
while (max < n) {
nextSeedStack.remove(0);
nextSeedStack.add(new ArrayList<Pair<byte[], ArrayList<Integer>>>());
++max;
}
// add to set in stack, if in stack
final int index = nextSeedStack.size() - 1 - max + n;
if (index >= 0)
nextSeedStack.get(index).add(Pair.makePair(buffer.clone(), r));
}
/**
* Find incremental the runs incremental.
*
* @param l
* the lengths
* @param endRuns
* the runs of length-1 ending at length -1
* @return a new array containing the end runs plus the length
*/
private ArrayList<Integer> incrementalCountRuns(final int l, final ArrayList<Integer> endRuns) {
final ArrayList<Integer> res = new ArrayList<Integer>();
int sz = endRuns.size();
// last end run dummy - contains the run count
int n = endRuns.get(--sz);
int pos = 0;
for (int i = l - 2; i >= 0; i -= 2) {
int p = (l - i) / 2;
// found something ?
if (equals(buffer, i, buffer, i + p, p)) {
while (i > 0 && buffer[i - 1] == buffer[i - 1 + p]) {
--i;
}
int lasti = -1;
while (pos < sz) {
lasti = endRuns.get(pos);
if (lasti <= i)
break;
lasti = -1;
++pos;
}
if (lasti != i)
++n;
res.add(i);
}
}
res.add(n);
return res;
}
/**
* Compares one segment of a byte array with a segment of a 2nd byte array.
*
* @param a
* first byte array
* @param aOff
* offset into first byte array
* @param b
* second byte array
* @param bOff
* offset into second byte array
* @param len
* length of the compared segments
* @return true if the segments are equal, otherwise false
*/
public final static boolean equals(byte a[], int aOff, byte b[], int bOff, int len) {
if (a == null || b == null)
return a == b;
while (len-- > 0)
if (a[aOff + len] != b[bOff + len])
return false;
return true;
}
/**
* Simple slow stupid method to count the runs in a String.
*
* @param s
* the string
* @return the count of runs.
*/
static int countRuns(String s) {
int n = 0;
int l = s.length();
for (int i = 0; i < l - 1; ++i) {
for (int k = i + 1; k < l; ++k) {
int p = 0;
while (i + p < k && k + p < l) {
if (s.charAt(i + p) != s.charAt(k + p))
break;
++p;
}
if (i + p == k) {
int jj = k + p - 1;
if (i > 0 && s.charAt(i - 1) == s.charAt(i - 1 + p)) {
continue;
}
while (jj + 1 < l && s.charAt(jj + 1) == s.charAt(jj + 1 - p)) {
++jj;
++k;
}
++n;
}
}
}
return n;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With