Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to express branch in Rebol PARSE dialect?

I have a mysql schema like below:

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}

now I want to extract some info from it: the filed name, type and comment if any. See below:

["id" "int" "" "name" "varchar" "the name" "content" "text" "something" ]

My code is:

parse data [
    any [ 
        thru {`} copy field to {`} {`}
        thru some space copy field-type to [ {(} | space]
        (comm: "")
        opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
        (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
    ]
]

but I get something like this:

["id" "int" "the name" "content" "text" "something"]

I know the line opt .. is not right.

I want express if found COMMENT key word first, then extract the comment info; if found lf first, then continue the next loop. But I don't know how to express it. Any one can help?

like image 664
Wayne Cui Avatar asked May 24 '15 11:05

Wayne Cui


4 Answers

I much favour (where possible) building up a set of grammar rules with positive terms to match target input—I find it's more literate, precise, flexible and easier to debug. In your snippet above, we can identify five core components:

space: use [space][
    space: charset "^-^/ "
    [some space]
]

word: use [letter][
    letter: charset [#"a" - #"z" #"A" - #"Z" "_"]
    [some letter]
]

id: use [letter][
    letter: complement charset "`"
    [some letter]
]

number: use [digit][
    digit: charset "0123456789"
    [some digit]
]

string: use [char][
    char: complement charset "'"
    [any [some char | "''"]]
]

With terms defined, writing a rule that describes the grammar of the input is relatively trivial:

result: collect [
    parsed?: parse/all data [ ; parse/all for Rebol 2 compatibility
        opt space
        some [
            (field: type: none comment: copy "")
            "`" copy field id "`"
            space 
            copy type word opt ["(" number ")"]
            any [
                space [
                    "COMMENT" space "'" copy comment string "'"
                    | word | "'" string "'" | number
                ]
            ]
            opt space "," (keep reduce [field type comment])
            opt space
        ]
    ]
]

As an added bonus, we can validate the input.

if parsed? [new-line/all/skip result true 3]

One wee application of new-line to smarten things up a little should yield:

== [
    "id" "int" "" 
    "name" "varchar" "the name" 
    "content" "text" "something"
]
like image 188
rgchris Avatar answered Oct 17 '22 21:10

rgchris


I think this is closer to what you are after.

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}
temp: []
parse data [
  any [ 
    thru {`} copy field to {`} {`}
    some space copy field-type to [ {(} | space]
    (comm: copy "")
    opt [ thru {COMMENT} some space thru {'} copy comm to {'}]
    (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
  ]
]
probe temp

To break down the differences.

  1. Set up a word with an empty block for temp
  2. Changed thru some space to just some space as this will move forward through the series in the same way. Note that the following is false

    parse "   " [ thru some space ]
    
  3. Changed comm: "" to comm: copy "" to make sure you get a new string each time you extract the comment (does not seem to affect the output, but is good practice)

  4. Changed {COMMENT} thru some space to {COMMENT} some space as per comment 2.
  5. Just added a probe on the end for debugging

As a note, you can use ?? (almost) anywhere in a parse rule to help with debugging which will show you your current position.

like image 28
johnk Avatar answered Oct 17 '22 22:10

johnk


parse/all for string parsing

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}
nodata:   charset { ()'}
dat: complement nodata

collect [   
    parse/all data [
        some [
            thru {`} copy field to {`} (keep field) skip 
            some " " copy type some dat ( keep type   comm:  copy "" )  
            copy rest thru "," (
                parse/all rest [
                    some [
                        [","   (keep comm) ]  
                     |  ["COMMENT"   some nodata copy comm to "'"  ]
                     |  skip                        
                    ]
                ]
            )
        ]
    ]
]
== ["id" "int" "" "name" "varchar" "the name" "content" "text" "something"]

another (better) solution with pure parse

collect [   
    probe parse/all data [
        some [
            thru {`} copy field to {`} (keep field) skip 
            some " " copy type some dat ( keep type   comm:  ""  further: [])  
            some [ 
            ","   (keep comm  further:  [ to end  skip]) 
            |  ["COMMENT"   some nodata copy comm to "'"  ]
            |  skip  further                     
            ]
        ]
    ]
]
like image 24
sqlab Avatar answered Oct 17 '22 20:10

sqlab


I figure out an alternative way to get the data as block! but not string!.

data: read/lines data.txt
probe data
temp: copy []

foreach d data [
    parse d [ 
        thru {`} copy field to {`} {`}
        thru some space copy field-type to [ {(} | space]
        (comm: "")
        opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
        (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
    ]
]

probe temp
like image 36
Wayne Cui Avatar answered Oct 17 '22 21:10

Wayne Cui