Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enumerate Possible Matches of Regular Expression in Java

I want to enumerate all the possible values of a finite regular expression in Java for testing purposes.

For some context, I have a regular expression that I'm using to match allowable color values in words. Here's a shortened version of it as an example:

(white|black)|((light|dark) )?(red|green|blue|gray)

I wanted to create a unit test that would enumerate all these values and pass each of them to my utility class which produces a Color object from these, that way if I change the regular expression, my unit tests will fail if an error occurs (i.e. the new color value is unsupported).

I know enumeration is possible, of course (see this question), but is there an existing library for Java which will enumerate all the possible matches for a regex?

Edit: I've implemented a library that does this. See my answer below for links.

like image 687
Brian Avatar asked Dec 03 '12 17:12

Brian


People also ask

How do you match a regular expression?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How many types of regular expression are there?

There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression.

How do you find the number of matches in a regular expression?

To count the number of regex matches, call the match() method on the string, passing it the regular expression as a parameter, e.g. (str. match(/[a-z]/g) || []). length . The match method returns an array of the regex matches or null if there are no matches found.


2 Answers

You are right, didn't find such a tool online as well but you can try Xeger from google

it can create a random matching string from a regexp, and with some code tweaking might do what you want. generation a random match:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);

Xeger code is very simple, it consists of 2 files which contain 5 methods between them..
it uses dk.brics.automaton to conver the regex to an automaton, then goes over the automaton transitions making random choices in every node.

the main function is generate:

   private void generate(StringBuilder builder, State state) {
    List<Transition> transitions = state.getSortedTransitions(true);
    if (transitions.size() == 0) {
        assert state.isAccept();
        return;
    }
    int nroptions = state.isAccept() ? transitions.size() : transitions.size() - 1;
    int option = XegerUtils.getRandomInt(0, nroptions, random);
    if (state.isAccept() && option == 0) {          // 0 is considered stop
        return;
    }
    // Moving on to next transition
    Transition transition = transitions.get(option - (state.isAccept() ? 1 : 0));
    appendChoice(builder, transition);
    generate(builder, transition.getDest());
}

you can see that in order to change it so you get all possible matches, you need to iterate over all possible combinations in every possible node (like incrementing a multi digit counter) you will need a hash to prevent loops, but that shouldn't take more than 5 senconds to code..

i would also suggest first checking that the regex is actually finate, by checking that it doesn't have *,+ and other symbols that make this action impossible (just to make this a complete tool for reuse)...

like image 96
Amitbe Avatar answered Oct 09 '22 08:10

Amitbe


For future browsers coming to this question, I wrote a library that uses dk.brics.automaton using a similar approach to Xeger from the accepted answer and published it. You can find it:

  • On GitHub
  • On the project site
  • In Maven Central

To add it as a dependency:

Maven

<dependency>
    <groupId>com.navigamez</groupId>
    <artifactId>greex</artifactId>
    <version>1.0</version>
</dependency>

Gradle

compile 'com.navigamez:greex:1.0'

Sample Code

Using this question as an example:

GreexGenerator generator = new GreexGenerator("(white|black)|((light|dark) )?(red|green|blue|gray)");
List<String> matches = generator.generateAll();
System.out.println(matches.size()); // "14"
like image 25
Brian Avatar answered Oct 09 '22 10:10

Brian