Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a program that inputs a regular expression and outputs strings that satisfy that regular expression

I think that the title accurately summarizes my question, but just to elaborate a bit.

Instead of using a regular expression to verify properties of existing strings, I'd like to use the regular expression as a way to generate strings that have certain properties.

Note: The function doesn't need to generate every string that satisfies the regular expression (cause that would be an infinite number of string for a lot of regexes). Just a sampling of the many valid strings is sufficient.

How feasible is something like this? If the solution is too complicated/large, I'm happy with a general discussion/outline. Additionally, I'm interested in any existing programs or libraries (.NET) that do this.

like image 245
brad Avatar asked Oct 05 '09 22:10

brad


2 Answers

Well a regex is convertible to a DFA which can be thought of as a graph. To generate a string given this DFA-graph you'd just find a path from a start state to an end state. You'd just have to think about how you want to handle cycles (Maybe traverse every cycle at least once to get a sampling? n times?), but I don't see why it wouldn't work.

like image 50
Falaina Avatar answered Nov 10 '22 16:11

Falaina


This utility on UtilityMill will invert some simple regexen. It is based on this example from the pyparsing wiki. The test cases for this program are:

[A-EA]
[A-D]*
[A-D]{3}
X[A-C]{3}Y
X[A-C]{3}\(
X\d
foobar\d\d
foobar{2}
foobar{2,9}
fooba[rz]{2}
(foobar){2}
([01]\d)|(2[0-5])
([01]\d\d)|(2[0-4]\d)|(25[0-5])
[A-C]{1,2}
[A-C]{0,3}
[A-C]\s[A-C]\s[A-C]
[A-C]\s?[A-C][A-C]
[A-C]\s([A-C][A-C])
[A-C]\s([A-C][A-C])?
[A-C]{2}\d{2}
@|TH[12]
@(@|TH[12])?
@(@|TH[12]|AL[12]|SP[123]|TB(1[0-9]?|20?|[3-9]))?
@(@|TH[12]|AL[12]|SP[123]|TB(1[0-9]?|20?|[3-9])|OH(1[0-9]?|2[0-9]?|30?|[4-9]))?
(([ECMP]|HA|AK)[SD]|HS)T
[A-CV]{2}
A[cglmrstu]|B[aehikr]?|C[adeflmorsu]?|D[bsy]|E[rsu]|F[emr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airu]|M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|Uu[bhopqst]|U|V|W|Xe|Yb?|Z[nr]
(a|b)|(x|y)
(a|b) (x|y)
like image 2
PaulMcG Avatar answered Nov 10 '22 16:11

PaulMcG