Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check number divisibility with regular expressions

Tags:

regex

math

Given a decimal number N as a string of digits, how do I check if it's divisible by M using regular expressions only, without converting to int?

M=2, 4, 5, 10 are obvious. For M=3 some interesting insights here: Regex filter numbers divisible by 3

Can anyone provide a solution for M=7, 9, 11, 13 etc? A generic one?

Testing code (in python, but feel free to use any language):

M = your number, e.g. 2
R = your regexp, e.g., '^[0-9]*[02468]$'

import re
for i in range(1, 2000):
    m = re.match(R, str(i))
    if i % M:
        assert not m, '%d should not match' % i
    else:
        assert m, '%d must match' % i

For those curious, here's an example for M=3 (assumes an engine with recursion support):

^
(
    | [0369]+ (?1)
    | [147] (?1) [258] (?1)
    | [258] (?1) [147] (?1)
    | ( [258] (?1) ) {3}
    | ( [147] (?1) ) {3}
)
$

Upd: for more discussion and examples see this thread. The expression posted there turned out to be buggy (fails on 70*N), but "how to get there" part is very educative.

like image 944
georg Avatar asked Sep 13 '12 10:09

georg


1 Answers

The perhaps-surprising result is that such a regular expression always exists. The much-less-surprising one is that it's usually not useful.

The existence result comes from the correspondence between deterministic finite automata (DFA) and regular expressions. So let's make a DFA. Denote the modulus by N (it doesn't need to be prime) and denote the numerical base by B, which is 10 for ordinary decimal numbers. The DFA with N states labelled 0 through N-1. The initial state is 0. The symbols of the DFA are the digits 0 through B-1. The states represent the remainder of the left-prefix of the input string, interpreted as an integer, when divided by N. The edges represent the change of state when you add a digit to the right. Arithmetically, this is the state map S(state,digit) = B * state + digit (modulo N). The accepting state is 0, since a zero remainder indicates divisibility. So we have a DFA. The languages recognized by the DFA are the same as that recognized by regular expressions, so one exists. So while this is interesting, it's not helpful, since it doesn't tell you much about how to determine the expression.

If you want a generic algorithm, it's easy to build such a DFA at run-time and populate its state table by direct computation. Initialization is just a pair of nested loops with run time O(M * N). Recognition with the machine is a constant time per input character. This is perfectly fast, but doesn't use a regexp library, if that's what you really need.

In getting toward an actual regular expression, we need to look at Fermat's Little Theorem. From the theorem, we know that B^(N-1) == 1 (modulo N). For example, when N=7 and B=10, what this means is that every block of 6 digits is equivalent to some single digit in the range 0 .. 6 for the purpose of divisibility. The exponent can be smaller than N-1; in general it's a factor of the Euler totient function of N. Call the size of the block D. There are N regular expressions for blocks of D digits, each one representing a particular equivalence class of remainders modulo N. At most, these expressions have length O(B^D), which is large. For N=7 that's a set of regular expressions a million characters long; I would imagine that would break most regexp libraries.

This relates to how the expression in the example code works; the expression (?1) is matching strings that are equal to 0 (mod 3). This works with N=3 since 10^1 == 1 (mod 3) which means that A0B == AB (mod 3). This is more complicated when the exponent is greater than 1, but the principle is the same. (Note that the example code uses a recognizer which is more than just regular expressions, strictly speaking.) The expressions [0369], [147], and [258] are the regular expressions for the digits 0, 1, and 2 in a modulo 3 expression. Generalizing, you would use the regexp-digits as above in an analogous manner.

I'm not providing code because (1) it would take longer to write than this answer has been, and (2) I really doubt it would execute in any known implementation.

like image 149
eh9 Avatar answered Oct 11 '22 18:10

eh9