Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I generate text matching a regular expression from a regular expression?

Yup, you read that right. I needs something that is capable of generating random text from a regular expression. So the text should be random, but be matched by the regular expression. It seems it doesn't exist, but I could be wrong.

Just a an example: that library would be capable of taking '[ab]*c' as input, and generate samples such as:

abc
abbbc
bac

etc.

Update: I created something myself: Xeger. Check out http://code.google.com/p/xeger/.

like image 567
Wilfred Springer Avatar asked Oct 16 '09 15:10

Wilfred Springer


People also ask

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Which function is used to match a regular expression?

match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).


2 Answers

I just created a library for doing this a minute ago. It's hosted here: http://code.google.com/p/xeger/. Carefully read the instructions before using it. (Especially the one referring to downloading another required library.) ;-)

This is the way you use it:

String regex = "[ab]{4,6}c"; Xeger generator = new Xeger(regex); String result = generator.generate(); assert result.matches(regex); 
like image 52
Wilfred Springer Avatar answered Oct 07 '22 15:10

Wilfred Springer


I am not aware of such a library. If you're interested in writing one yourself, then these are probably the steps you'll need to take:

  1. Write a parser for regular expressions (you may want to start out with a restricted class of regexes).

  2. Use the result to construct an NFA.

  3. (Optional) Convert the NFA to a DFA.

  4. Randomly traverse the resulting automaton from the start state to any accepting state, while storing the characters outputted by every transition.

The result is a word which is accepted by the original regex. For more, see e.g. Converting a Regular Expression into a Deterministic Finite Automaton.

like image 43
Stephan202 Avatar answered Oct 07 '22 15:10

Stephan202