Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Fibonacci numbers using regex

I found the following code example on this blog post :

final String FIBONACCI = 
   "(?x) .? | ( \\2?+ (\\1|^.) )* ..";

for (int n = 0; n < 10000; n++) {
   String s = new String(new char[n]);
   if (s.matches(FIBONACCI)) {
      System.out.printf("%s ", n);
   }
}

output: 0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 ...

How does (?x) .? | ( \\2?+ (\\1|^.) )* .. match Fibonacci numbers?

like image 962
Melih Altıntaş Avatar asked Sep 04 '13 21:09

Melih Altıntaş


1 Answers

(?x) .? | ( \\2?+ (\\1|^.) )* ..

There are a lot of things going on here which may confuse. I will go through each of these things in order to explain why the algorithm works.

  1. The match is being done on a string with the length of the regex, not the actual number. The only real data in the string is its length.

  2. \\The double backslashes are just because in Java string literals backslashes have to be backslashed so that it is clear you aren't escaping something else. I won't show them in any future code in this answer.

  3. (?x): This enables extended regex mode. In this mode whitespace that is not backslashed or within a character class is ignored, allowing the regular expression to be broken into more readable parts with embedded comments. [sarand.com].

  4. .?: This will match 0 or 1 character strings. This match is only used for the f(0), f(1) and f(2) cases, otherwise it will be discarded.

  5. |: This means that if the first attempt to match 1 or two characters didn't work, then try to match everything on the right of it.

  6. (: This opens the first group (referenced by \1 later on).

  7. (\2?+ The + makes the ? a possessive quantifier. In this case the result is that the ? means use the \2 backreference if it is defined and the + means don't go back and try not using it if the regex doesn't work with it.

  8. (\1|^.): This will match either everything that has been matched so far or a single character. This of course means that the first "everything matched so far" is a single character. Since this is the second regex, it is also the new \2

  9. )*: This will repeat the algorithm. Every time it repeats it will define new values for \1 and \2. These values will be equal to F(n-1) and F(n-2) for the current iteration which will be F(n). Each iteration will be added to the previous, meaning you have a sum of F(n) 0 to n. Try running the algorithm through your head for some smaller numbers to get the idea.

  10. ..: One dot is required to match the f(1) that is not included in the sum, the second is because the Second Identity of Fibonacci Numbers states that the sum of a sequence of fibonacci numbers is a fibonnaci number minus one. (1)

    http://i.stack.imgur.com/SaRUR.png

  11. Walking through the replacements you can see how this will continue to add Fibonacci numbers until filling the string. The first iteration matches the ^., so 1. The second iteration matches the previous partial match with the \2 as well as the entire previous match with the \1. That makes two for the second iteration. Third iteration takes that second part of the match from the second iteration (1) as well as the entire second iteration (2). This makes three for the third iteration. Add the iterations together and you have a sum of fib numbers.

Please see Why does Java regex engine throw StringIndexOutOfBoundsException on a + repetition? for more information on why this recurrence actually works.

like image 151
jmh Avatar answered Sep 27 '22 19:09

jmh