How to express branch in Rebol PARSE dialect?

Question

I have a mysql schema like below:

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}

now I want to extract some info from it： the filed name, type and comment if any. See below:

["id" "int" "" "name" "varchar" "the name" "content" "text" "something" ]

My code is:

parse data [
    any [ 
        thru {`} copy field to {`} {`}
        thru some space copy field-type to [ {(} | space]
        (comm: "")
        opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
        (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
    ]
]

but I get something like this:

["id" "int" "the name" "content" "text" "something"]

I know the line opt .. is not right.

I want express if found COMMENT key word first, then extract the comment info; if found lf first, then continue the next loop. But I don't know how to express it. Any one can help?

rgchris · Accepted Answer

I much favour (where possible) building up a set of grammar rules with positive terms to match target input—I find it's more literate, precise, flexible and easier to debug. In your snippet above, we can identify five core components:

space: use [space][
    space: charset "^-^/ "
    [some space]
]

word: use [letter][
    letter: charset [#"a" - #"z" #"A" - #"Z" "_"]
    [some letter]
]

id: use [letter][
    letter: complement charset "`"
    [some letter]
]

number: use [digit][
    digit: charset "0123456789"
    [some digit]
]

string: use [char][
    char: complement charset "'"
    [any [some char | "''"]]
]

With terms defined, writing a rule that describes the grammar of the input is relatively trivial:

result: collect [
    parsed?: parse/all data [ ; parse/all for Rebol 2 compatibility
        opt space
        some [
            (field: type: none comment: copy "")
            "`" copy field id "`"
            space 
            copy type word opt ["(" number ")"]
            any [
                space [
                    "COMMENT" space "'" copy comment string "'"
                    | word | "'" string "'" | number
                ]
            ]
            opt space "," (keep reduce [field type comment])
            opt space
        ]
    ]
]

As an added bonus, we can validate the input.

if parsed? [new-line/all/skip result true 3]

One wee application of new-line to smarten things up a little should yield:

== [
    "id" "int" "" 
    "name" "varchar" "the name" 
    "content" "text" "something"
]

johnk · Answer

I think this is closer to what you are after.

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}
temp: []
parse data [
  any [ 
    thru {`} copy field to {`} {`}
    some space copy field-type to [ {(} | space]
    (comm: copy "")
    opt [ thru {COMMENT} some space thru {'} copy comm to {'}]
    (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
  ]
]
probe temp

To break down the differences.

Set up a word with an empty block for temp
Changed thru some space to just some space as this will move forward through the series in the same way. Note that the following is false
```
parse "   " [ thru some space ]
```
Changed comm: "" to comm: copy "" to make sure you get a new string each time you extract the comment (does not seem to affect the output, but is good practice)
Changed {COMMENT} thru some space to {COMMENT} some space as per comment 2.
Just added a probe on the end for debugging

As a note, you can use ?? (almost) anywhere in a parse rule to help with debugging which will show you your current position.

sqlab · Answer

parse/all for string parsing

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}
nodata:   charset { ()'}
dat: complement nodata

collect [   
    parse/all data [
        some [
            thru {`} copy field to {`} (keep field) skip 
            some " " copy type some dat ( keep type   comm:  copy "" )  
            copy rest thru "," (
                parse/all rest [
                    some [
                        [","   (keep comm) ]  
                     |  ["COMMENT"   some nodata copy comm to "'"  ]
                     |  skip                        
                    ]
                ]
            )
        ]
    ]
]
== ["id" "int" "" "name" "varchar" "the name" "content" "text" "something"]

another (better) solution with pure parse

collect [   
    probe parse/all data [
        some [
            thru {`} copy field to {`} (keep field) skip 
            some " " copy type some dat ( keep type   comm:  ""  further: [])  
            some [ 
            ","   (keep comm  further:  [ to end  skip]) 
            |  ["COMMENT"   some nodata copy comm to "'"  ]
            |  skip  further                     
            ]
        ]
    ]
]

Wayne Cui · Answer

I figure out an alternative way to get the data as block! but not string!.

data: read/lines data.txt
probe data
temp: copy []

foreach d data [
    parse d [ 
        thru {`} copy field to {`} {`}
        thru some space copy field-type to [ {(} | space]
        (comm: "")
        opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
        (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
    ]
]

probe temp

How to express branch in Rebol PARSE dialect?

Tags:

parsing

rebol

rebol3

Wayne Cui

4 Answers

rgchris

johnk

sqlab

Wayne Cui

Recent Activity

Donate For Us

How to express branch in Rebol PARSE dialect?

Tags:

parsing

rebol

rebol3

Wayne Cui

4 Answers

rgchris

johnk

sqlab

Wayne Cui

Related questions

Recent Activity

Donate For Us