Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

a regular expression generator for number ranges

I checked on the stackExchange description, and algorithm questions are one of the allowed topics. So here goes.

Given an input of a range, where begin and ending numbers have the same number of digits (say, 2, 3, or 4), I want to write code to generate a set of regular expressions which, when checked against a number in turn, tell me whether that number is in the original range.

For example: if the range is 145-387, then 146, 200, and 280 would all match one of the regular expressions generated, and 144, 390 (used to say 290), and 445 (used to say 345) would not.

I have been thinking the result would be a list of regular expressions like:

14[5-9]             // match 145-149
1[5-9]0-9]          // 150-199
2[0-9][0-9]         // 200-299
3[0-7][0-9]         // 300-379
38[0-7]             // 380-387

and then software checking the number would test to see if the 3-digit code being tested matched any of these.

So what's the best way to generate the set of expressions?

The latest (in a series) that I've come up with is to:

  1. determine the first digit at which the two range numbers differ (1145-1158, first different digit is the 3rd)
  2. for the different digits, determine if their first digits differ by more than one -- if so, the range for that gets its own regex (200-299 in our example)
  3. to get lower ranges: for each other digit: prefix by the first digit(s) from the beginning of the range, increment the digit by one, pad with 0s to the same length, and pair with a number that has 9 in the digit place and all the padded places. In our example, increment 4 to 5, pad to get 150, generate the regex to handle 150-199.
  4. to get higher ranges: for each other digit: prefix with first digit(s) from end of range, decrement digit by one, pad rest with 0s, pair with a number with 9s in all the padded 0 places and the decremented digit. In our example, the regex to handle 300-379.

Am I missing something? There are details even in the above that I'm glossing over, it seems like something that would benefit from an algorithmic sword slashing through the details. But the other things I've come up with are even messier than this.

like image 967
arcy Avatar asked Nov 04 '15 01:11

arcy


People also ask

How do you specify a number range in regex?

Example: Regex Number Range 1-20 Range 1-20 has both single digit numbers (1-9) and two digit numbers (10-20). For double digit numbers we have to split the group in two 10-19 (or in regex: "1[0-9]") and 20. Then we can join all these with an alternation operator to get "([1-9]|1[0-9]|20)".

Can you use regex for numbers?

The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. That's the easy part. Matching the three-digit numbers is a little more complicated, since we need to exclude numbers 256 through 999.

What is the regular expression for numbers only?

To check for all numbers in a field To get a string contains only numbers (0-9) we use a regular expression (/^[0-9]+$/) which allows only numbers. Next, the match() method of the string object is used to match the said regular expression against the input value.

How do I create a regular expression?

If you want to match for the actual '+', '. ' etc characters, add a backslash( \ ) before that character. This will tell the computer to treat the following character as a search character and consider it for matching pattern. Example : \d+[\+-x\*]\d+ will match patterns like "2+2" and "3*9" in "(2+2) * 3*9".


1 Answers

Here's my solution and an algorithm with complexity O(log n) (n is the end of the range). I believe it is the simplest one here:

Basically, split your task into these steps:

  1. Gradually "weaken" the start of the range.
  2. Gradually "weaken" the end of the range.
  3. Merge those two.

By "weaken", I mean finding the end of range that can be represented by simple regex for this specific number, for example:

145 -> 149,150 -> 199,200 -> 999,1000 -> etc.

Here's a backward one, for the end of the range:

387 -> 380,379 -> 300,299 -> 0

Merging would be the process of noticing the overlap of 299->0 and 200->999 and combining those into 200->299.

In result, you would get this set of numbers (first list intact, second one inverted:

145, 149, 150, 199, 200, 299, 300, 379, 380, 387

Now, here is the funny part. Take the numbers in pairs, and convert them to ranges:

145-149, 150-199, 200-299, 300-379, 380-387

Or in regex:

14[5-9], 1[5-9][0-9], 2[0-9][0-9], 3[0-7][0-9], 38[0-7]

Here's how the code for the weakening would look like:

public static int next(int num) {
    //Convert to String for easier operations
    final char[] chars = String.valueOf(num).toCharArray();
    //Go through all digits backwards
    for (int i=chars.length-1; i>=0;i--) {
        //Skip the 0 changing it to 9. For example, for 190->199
        if (chars[i]=='0') {
            chars[i] = '9';
        } else { //If any other digit is encountered, change that to 9, for example, 195->199, or with both rules: 150->199
            chars[i] = '9';
            break;
        }
    }

    return Integer.parseInt(String.valueOf(chars));
}

//Same thing, but reversed. 387 -> 380, 379 -> 300, etc
public static int prev(int num) {
    final char[] chars = String.valueOf(num).toCharArray();
    for (int i=chars.length-1; i>=0;i--) {
        if (chars[i] == '9') {
            chars[i] = '0';
        } else {
            chars[i] = '0';
            break;
        }
    }

    return Integer.parseInt(String.valueOf(chars));
}

The rest is technical details and is easy to implement. Here's an implementation of this O(log n) algorithm: https://ideone.com/3SCvZf

Oh, and by the way, it works with other ranges too, for example for range 1-321654 the result is:

[1-9]
[1-9][0-9]
[1-9][0-9][0-9]
[1-9][0-9][0-9][0-9]
[1-9][0-9][0-9][0-9][0-9]
[1-2][0-9][0-9][0-9][0-9][0-9]
3[0-1][0-9][0-9][0-9][0-9]
320[0-9][0-9][0-9]
321[0-5][0-9][0-9]
3216[0-4][0-9]
32165[0-4]

And for 129-131 it's:

129
13[0-1]
like image 117
bezmax Avatar answered Sep 16 '22 12:09

bezmax