For testing purposes on a project I'm working on, I have a need to, if given a regular expression, randomly generate a string that will FAIL to be matched by it. For instance, if I'm given this regex:
^[abcd]d+
Then I should be able to generate strings such as:
hnbbad
uduebbaef
9f8;djfew
skjcc98332f
...each of which does NOT match the regex, but NOT generate:
addr32
bdfd09usdj
cdddddd-9fdssee
...each of which DO. In other words, I want something like an anti-Xeger.
Does such a library exist, preferably in Python (if I can understand the theory, I can most likely convert it to Python if need be)? I gave some thought to how I could write this, but given the scope of regular expressions, it seemed that might be a much harder problem than what things like Xeger can tackle. I also looked around for a pre-made library to do this, but either I'm not using the right keywords to search or nobody's had this problem before.
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .
Just enter your regexp in the field below, press the Generate Text button, and you'll get a random text that matches your regexp. Press a button – invert a regexp. No ads, nonsense, or garbage.
A regular expression followed by an asterisk ( * ) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.
My initial instinct is, no, such a library does not exist because it's not possible. You can't be sure that you can find a valid input for any arbitrary regular expression in a reasonable amount of time.
For example, proving whether a number is prime is believed to be a hard to solve mathematical problem. The following regular expression matches any string which is at least 10000 characters long and whose total length is a prime number:
(?!(..+)\1+$).{10000}
I doubt that any library exists that can find a valid input to this regular expression in reasonable time. And this is a very easy example with a simple solution, e.g. 'x' * 10007
will work. It would be possible to come up with other regular expressions that are much harder to find valid inputs for.
I think the only way you are going to solve this is if you limit yourself to some subset of all possible regular expressions.
But having said that if you have a magical library that generates text that matches for any arbitrary regular expression then all you need to do is generate a regular expression that matches all the strings that don't match your original expression.
Luckily this is possible using a negative lookahead:
^(?![\s\S]*(?:^[abcd]d+))
If you are willing to change the requirements to only allow a limited subset of regular expressions then you can negate the regular expression by using boolean logic. For example if ^[abcd]d+
becomes ^[^abcd]|^[abcd][^d]
. It is then possible to find a valid input for this regular expression in reasonable time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With