Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to define some binary sequence

Tags:

regex

How would you write a regular expression to define all strings of 0's and 1's that, as a binary number, represent an integer that is multiple of 3.

Some valid binary numbers would be:

11
110
1001
1100
1111
like image 744
Jaelebi Avatar asked May 15 '09 06:05

Jaelebi


2 Answers

Using the DFA here we can make a regular expression the following way, where A, B, C represent the states of the DFA.

A = 1B + 0A
B = 1A + 0C
C = 1C + 0B

C = 1*0B // Eliminate recursion

B = 1A + 0(1*0B)
B = 01*0B + 1A
B = (01*0)*1A // Eliminate recursion

A = 1(01*0)*1A + 0A
A = (1(01*0)*1 + 0)A
A = (1(01*0)*1 + 0)* // Eliminate recursion

Resulting in a PCRE regex like:

/^(1(01*0)*1|0)+$/

Perl test/example:

use strict;

for(qw(
11
110
1001
1100
1111
0
1
10
111
)){
    print "$_ (", eval "0b$_", ") ";
    print /^(1(01*0)*1|0)+$/? "matched": "didnt match";
    print "\n";
}

Outputs:

11 (3) matched
110 (6) matched
1001 (9) matched
1100 (12) matched
1111 (15) matched
0 (0) matched
1 (1) didnt match
10 (2) didnt match
111 (7) didnt match
like image 94
Qtax Avatar answered Oct 07 '22 20:10

Qtax


When you divide a number by three, there are only three possible remainders (0, 1 and 2). What you're aiming at is to ensure the remainder is 0, hence a multiple of three.

This can be done by an automaton with the three states:

  • ST0, multiple of 3 (0, 3, 6, 9, ....).
  • ST1, multiple of 3 plus 1 (1, 4, 7, 10, ...).
  • ST2, multiple of 3 plus 2 (2, 5, 8, 11, ...).

Now think of any non-negative number (that's our domain) and multiply it by two (tack a binary zero on to the end). The transitions for that are:

ST0 -> ST0 (3n * 2 = 3 * 2n, still a multiple of three).
ST1 -> ST2 ((3n+1) * 2 = 3*2n + 2, a multiple of three, plus 2).
ST2 -> ST1 ((3n+2) * 2 = 3*2n + 4 = 3*(2n+1) + 1, a multiple of three, plus 1).

Also think of any non-negative number, multiply it by two then add one (tack a binary one on to the end). The transitions for that are:

ST0 -> ST1 (3n * 2 + 1 = 3*2n + 1, a multiple of three, plus 1).
ST1 -> ST0 ((3n+1) * 2 + 1 = 3*2n + 2 + 1 = 3*(2n+1), a multiple of three).
ST2 -> ST2 ((3n+2) * 2 + 1 = 3*2n + 4 + 1 = 3*(2n+1) + 2, a multiple of three, plus 2).

This idea is that, at the end, you need to finish up in state ST0. However, given that there can be an arbitrary number of sub-expressions (and sub-sub-expressions), it does not lend itself easily to reduction to a regular expression.

What you have to do is allow for any of the transition sequences that can get from ST0 to ST0 then just repeat them:

These boil down to the two RE sequences:

ST0 --> ST0                                      :  0+
    [0]
ST0 --> ST1 (--> ST2 (--> ST2)* --> ST1)* --> ST0:  1(01*0)*1
    [1]     ([0]     ([1]    )* [0]    )* [1]

or the regex:

(0+|1(01*0)*1)+

This captures the multiples of three, or at least the first ten that I tested. You can try as many as you like, they'll all work, that's the beauty of mathematical analysis rather than anecdotal evidence.

like image 36
paxdiablo Avatar answered Oct 07 '22 18:10

paxdiablo