Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there any compiler that can convert regexp to fsm? or could convert to human words?

Something that can convert

r"a+|(?:ab+c)"

to

{
    (1, 'a') : [2, 3],
    (2, 'a') : [2],
    (3, 'b') : [4, 3],
    (4, 'c') : [5]
}

or something similar

and accepting in 2 or 5

like image 859
somethin Avatar asked Jun 24 '12 08:06

somethin


2 Answers

You have a debug flag that prints your regexp in a more readable form:

>>> import re
>>> re.compile(r"a+|(?:ab+c)", flags=re.DEBUG)
branch
  max_repeat 1 65535
    literal 97
or
  subpattern None
    literal 97
    max_repeat 1 65535
      literal 98
    literal 99
<_sre.SRE_Pattern object at 0x0000000002325328>
like image 129
kosii Avatar answered Nov 05 '22 04:11

kosii


i have some code that will do this. it's not well documented and it's not supported, but if you're interested you're welcome to look at it.

the library is called rxpy and the repository is http://code.google.com/p/rxpy

the routine that does parsing is parse_pattern at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/pattern.py#871

if you call repr(...) on the result from that you get a graph in the "dot language" - https://en.wikipedia.org/wiki/DOT_language

for example, see the tests as http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#47

to show what i mean ,let's look at the test at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#234 which is for 'ab*c':

"""digraph {
 0 [label="a"]
 1 [label="...*"]
 2 [label="b"]
 3 [label="c"]
 4 [label="Match"]
 0 -> 1
 1 -> 2
 1 -> 3
 3 -> 4
 2 -> 1
}"""

that starts at 0 which can match an "a" to go to state 1. from there you can match a "b" to go to state 2 or a "c" to go to state 3. state 2 then has a transition back to 1 that can consume another "b", etc etc. it's a bit ugly to read by hand, but when the test fails you get a little graph displayed on the screen.

the library also has various "engines" which will match strings against this graph (and so do regular expression matching). but it is much slower than the python library (because it is pure python).

this is not supported and may not be very clear - sorry - but i think it's close to what you want and you're welcome to use it if it's useful (MPL or LGPL licence).

like image 4
andrew cooke Avatar answered Nov 05 '22 02:11

andrew cooke