I have a mysql schema like below:
data: {
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT '' COMMENT 'the name',
`content` text COMMENT 'something',
}
now I want to extract some info from it: the filed name, type and comment if any. See below:
["id" "int" "" "name" "varchar" "the name" "content" "text" "something" ]
My code is:
parse data [
any [
thru {`} copy field to {`} {`}
thru some space copy field-type to [ {(} | space]
(comm: "")
opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
(repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
]
]
but I get something like this:
["id" "int" "the name" "content" "text" "something"]
I know the line opt ..
is not right.
I want express if found COMMENT
key word first, then extract the comment info; if found lf first, then continue the next loop. But I don't know how to express it. Any one can help?
I much favour (where possible) building up a set of grammar rules with positive terms to match target input—I find it's more literate, precise, flexible and easier to debug. In your snippet above, we can identify five core components:
space: use [space][
space: charset "^-^/ "
[some space]
]
word: use [letter][
letter: charset [#"a" - #"z" #"A" - #"Z" "_"]
[some letter]
]
id: use [letter][
letter: complement charset "`"
[some letter]
]
number: use [digit][
digit: charset "0123456789"
[some digit]
]
string: use [char][
char: complement charset "'"
[any [some char | "''"]]
]
With terms defined, writing a rule that describes the grammar of the input is relatively trivial:
result: collect [
parsed?: parse/all data [ ; parse/all for Rebol 2 compatibility
opt space
some [
(field: type: none comment: copy "")
"`" copy field id "`"
space
copy type word opt ["(" number ")"]
any [
space [
"COMMENT" space "'" copy comment string "'"
| word | "'" string "'" | number
]
]
opt space "," (keep reduce [field type comment])
opt space
]
]
]
As an added bonus, we can validate the input.
if parsed? [new-line/all/skip result true 3]
One wee application of new-line
to smarten things up a little should yield:
== [
"id" "int" ""
"name" "varchar" "the name"
"content" "text" "something"
]
I think this is closer to what you are after.
data: {
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT '' COMMENT 'the name',
`content` text COMMENT 'something',
}
temp: []
parse data [
any [
thru {`} copy field to {`} {`}
some space copy field-type to [ {(} | space]
(comm: copy "")
opt [ thru {COMMENT} some space thru {'} copy comm to {'}]
(repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
]
]
probe temp
To break down the differences.
temp
Changed thru some space
to just some space
as this will move forward through the series in the same way. Note that the following is false
parse " " [ thru some space ]
Changed comm: ""
to comm: copy ""
to make sure you get a new string each time you extract the comment (does not seem to affect the output, but is good practice)
{COMMENT} thru some space
to {COMMENT} some space
as per comment 2.As a note, you can use ??
(almost) anywhere in a parse rule to help with debugging which will show you your current position.
parse/all for string parsing
data: {
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT '' COMMENT 'the name',
`content` text COMMENT 'something',
}
nodata: charset { ()'}
dat: complement nodata
collect [
parse/all data [
some [
thru {`} copy field to {`} (keep field) skip
some " " copy type some dat ( keep type comm: copy "" )
copy rest thru "," (
parse/all rest [
some [
["," (keep comm) ]
| ["COMMENT" some nodata copy comm to "'" ]
| skip
]
]
)
]
]
]
== ["id" "int" "" "name" "varchar" "the name" "content" "text" "something"]
another (better) solution with pure parse
collect [
probe parse/all data [
some [
thru {`} copy field to {`} (keep field) skip
some " " copy type some dat ( keep type comm: "" further: [])
some [
"," (keep comm further: [ to end skip])
| ["COMMENT" some nodata copy comm to "'" ]
| skip further
]
]
]
]
I figure out an alternative way to get the data as block! but not string!.
data: read/lines data.txt
probe data
temp: copy []
foreach d data [
parse d [
thru {`} copy field to {`} {`}
thru some space copy field-type to [ {(} | space]
(comm: "")
opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
(repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
]
]
probe temp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With