Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

From within Java, how can I create a list of all possible numbers from a specific regular expression?

Tags:

java

regex

I have a strange problem, at least one I've never come across. I have a precondition where customers have simple regular expressions associated with labels. The labels are all they care about. What I would like to do is create a list of all possible numbers that would match each of these regular expressions. I would have logic that would warn me when the list is beyond a certain threshold.

Here is an example of the regular expression: 34.25.14.(227|228|229|230|243|244|245|246)

Lets say that these ip(s) are associated with ACME. Behind the scenes when the user selects ACME (in our UI), I'm filling out a filter object that contains all of those possible numbers and submitting them as an OR query to a highly specialized Vertica database.

I just can't determine an elegant way of creating a list of numbers from said regular expressions.

The others aspect of this, is that the java code within another portion of the product is using those regular expressions to show ACME by using a java Pattern.compile(), which means the customer 'could' create a complex regular expression. I've only seen them, so far, use something simple as shown above.

Is there a method that will generate a list based on regular expression?

Thanks for your time.

like image 646
D-Klotz Avatar asked Oct 16 '12 21:10

D-Klotz


People also ask

What is regex pattern in Java?

Regular Expressions or Regex (in short) in Java is an API for defining String patterns that can be used for searching, manipulating, and editing a string in Java. Email validation and passwords are a few areas of strings where Regex is widely used to define the constraints. Regular Expressions are provided under java.

What method should you use when you want to get all sequences matching a regex pattern in a string?

To find all the matching strings, use String's scan method.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .

What is the class in Java that is used to test string against a regular expression?

The Matcher class is used to match a given regular expression ( Pattern instance) against a text multiple times. In other words, to look for multiple occurrences of the regular expression in the text. The Matcher will tell you where in the text (character index) it found the occurrences.


1 Answers

Related:

A library that generates data matching a regular expression (with limitations): http://code.google.com/p/xeger/

Several solutions, such as conversing a regex into a grammar: Using Regex to generate Strings rather than match them


EDIT: Actually, you can make it work!!! The only thing to address is to impose some domain-specific constraints to preclude combinatorial explosion like a+.

If you add to the Xeger class something like this:

public void enumerate() {
    System.out.println("enumerate: \"" + regex + "\"");
    int level = 0;
    String accumulated = "";
    enumerate(level, accumulated, automaton.getInitialState());
}

private void enumerate(int level, String accumulated, State state) {
    List<Transition> transitions = state.getSortedTransitions(true);
    if (state.isAccept()) {
        System.out.println(accumulated);
        return;
    }
    if (transitions.size() == 0) {
        assert state.isAccept();
        return;
    }
    int nroptions = state.isAccept() ? transitions.size() : transitions.size() - 1;
    for (int option = 0; option <= nroptions; option++) {
        // Moving on to next transition
        Transition transition = transitions.get(option - (state.isAccept() ? 1 : 0));
        for (char choice = transition.getMin(); choice <= transition.getMax(); choice++) {
            enumerate(level + 1, accumulated + choice, transition.getDest());
        }
    }
}

... and something like this to XegerTest:

@Test
public void enumerateAllVariants() {
    //String regex = "[ab]{4,6}c";
    String regex = "34\\.25\\.14\\.(227|228|229|230|243|244|245|246)";
    Xeger generator = new Xeger(regex);
    generator.enumerate();
}

... you will get this:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running nl.flotsam.xeger.XegerTest
enumerate: "34\.25\.14\.(227|228|229|230|243|244|245|246)"
34.25.14.227
34.25.14.228
34.25.14.229
34.25.14.243
34.25.14.244
34.25.14.245
34.25.14.246
34.25.14.230
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.114 sec

... and, guess what. For "[ab]{4,6}c" it correctly produces 112 variants.

It's really a quick and dirty experiment, but it seems to work ;).

like image 89
full.stack.ex Avatar answered Oct 09 '22 10:10

full.stack.ex