Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for finding strings that a specific Regex will match

Tags:

c#

regex

Given a regex pattern, I'm trying to find a string that matches it. Similar to how Django reverses them, but in C#. Are there any pre-made C# libraries that do this?


Edit: Moving this project to Google code pretty soon.

Current Test Results

^abc$                     > abc                  : pass
\Aa                       > a                    : pass
z\Z                       > z                    : pass
z\z                       > z                    : pass
z\z                       > z                    : pass
\G\(a\)                   > \(a\)                : pass
ab\b                      > ab                   : pass
a\Bb                      > ab                   : pass
\a                        >                     : pass
[\b]                      >                    : pass
\t                        > \t                   : pass
\r                        > \r                   : pass
\v                        > ♂                    : pass
\f                        > \f                   : pass
\n                        > \n                   : pass
\e                        > ←                    : pass
\141                      > a                    : pass
\x61                      > a                    : pass
\cC                       > ♥                    : pass
\u0061                    > a                    : pass
\\                        > \\                   : pass
[abc]                     > a                    : pass
[^abc]                    > î                    : pass
[a-z]                     > a                    : pass
.                         > p                    : pass
\w                        > W                    : pass
\W                        > ☻                    : pass
\s                        > \n                   : pass
\S                        > b                    : pass
\d                        > 4                    : pass
\D                        > G                    : pass
(a)\1                     > aa                   : pass
(?<n>a)\k<n>              > aa                   : pass
(?<n>a)\1                 > aa                   : pass
(a)(?<n>b)\1\2            > abab                 : pass
(?<n>a)(b)\1\2            > abba                 : pass
(a(b))\1\2                > ababb                : pass
(a(b)(c(d)))\1\2\3\4      > abcdabcdbcdd         : pass
a\0                       > a                    : pass
ab*                       > a                    : pass
ab+                       > abbb                 : pass
ab?                       > a                    : pass
ab{2}                     > abb                  : pass
ab{2,}                    > abbbbbbbbb           : pass
ab{2,3}                   > abb                  : pass
ab*?                      > abb                  : pass
ab+?                      > abbbbb               : pass
ab??                      > a                    : pass
ab{2}?                    > abb                  : pass
ab{2,}?                   > abbbbbbbbb           : pass
ab{2,3}?                  > abbb                 : pass
/users(?:/(?<id>\d+))?    > /users/77            : pass
Passed 52/52 tests.
like image 243
mpen Avatar asked Nov 28 '10 19:11

mpen


People also ask

Which algorithm is used for string matching?

Hashing-string matching algorithms: Rabin Karp Algorithm: It matches the hash value of the pattern with the hash value of current substring of text, and if the hash values match then only it starts matching individual characters.

How do you check if a regex matches a string?

If you need to know if a string matches a regular expression RegExp , use RegExp.prototype.test() . If you only want the first match found, you might want to use RegExp.prototype.exec() instead.

What algorithm is used with regex?

Most library implementations of regular expressions use a backtracking algorithm that can take an exponential amount of time on some inputs.

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


2 Answers

see for example Using Regex to generate Strings rather than match them

also you can take a look at http://en.wikipedia.org/wiki/Deterministic_finite-state_machine especially at "Accept and Generate modes" section.

as others noted you will need to create a DFA from your regular expression and then generate your strings using this DFA.

to convert your regular expression to DFA, generate NFA first (see for example http://lambda.uta.edu/cse5317/spring01/notes/node9.html) and then convert NFA to DFA.

the easiest way i see is to use a parser generator program for that. i do not think django does this.

hope this helps.

like image 188
akonsu Avatar answered Oct 26 '22 15:10

akonsu


"Are there any pre-made C# libraries that do this?"

NO

(I expect this will be accepted as the answer momentarily)

like image 41
abelenky Avatar answered Oct 26 '22 13:10

abelenky