Yup, you read that right. I needs something that is capable of generating random text from a regular expression. So the text should be random, but be matched by the regular expression. It seems it doesn't exist, but I could be wrong.
Just a an example: that library would be capable of taking '[ab]*c
' as input, and generate samples such as:
abc
abbbc
bac
etc.
Update: I created something myself: Xeger. Check out http://code.google.com/p/xeger/.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
I just created a library for doing this a minute ago. It's hosted here: http://code.google.com/p/xeger/. Carefully read the instructions before using it. (Especially the one referring to downloading another required library.) ;-)
This is the way you use it:
String regex = "[ab]{4,6}c"; Xeger generator = new Xeger(regex); String result = generator.generate(); assert result.matches(regex);
I am not aware of such a library. If you're interested in writing one yourself, then these are probably the steps you'll need to take:
Write a parser for regular expressions (you may want to start out with a restricted class of regexes).
Use the result to construct an NFA.
(Optional) Convert the NFA to a DFA.
Randomly traverse the resulting automaton from the start state to any accepting state, while storing the characters outputted by every transition.
The result is a word which is accepted by the original regex. For more, see e.g. Converting a Regular Expression into a Deterministic Finite Automaton.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With