I'm trying to use PARSE to turn a CSV line into a Rebol block. Easy enough to write in open code, but as with other questions I am trying to learn what the dialect can do without that.
So if a line says:
"Look, that's ""MR. Fork"" to you!",Hostile Fork,,http://hostilefork.com
Then I want the block:
[{Look, that's "MR. Fork" to you!} {Hostile Fork} none {http://hostilefork.com}]
Issues to notice:
""
http://rebol.com
as STRING! instead of LOADing them into types such as URL!
To make it more uniform, the first thing I do is append a comma to the input line. Then I have a column-rule
which captures a single column terminated by a comma...which may either be in quotes or not.
I know how many columns there should be due to the header line, so the code then says:
unless parse line compose [(column-count) column-rule] [
print rejoin [{Expected } column-count { columns.}]
]
But I'm a bit stuck on writing column-rule
. I need a way in the dialect to express "Once you find a quote, keep skipping quote pairs until you find a quote standing all on its own." What's a good way to do that?
As with most parse problems, I try to build a grammar that best describes the elements of the input format.
In this case, we have nouns:
[comma ending value-chars qmark quoted-chars value header row]
Some verbs:
[row-feed emit-value]
And the operative nouns:
[current chunk current-row width]
I suppose I could possibly break it down a little more, but is enough to work with. First, the foundation:
comma: ","
ending: "^/"
qmark: {"}
value-chars: complement charset reduce [qmark comma ending]
quoted-chars: complement charset reduce [qmark]
Now the value structure. Quoted values are built up from chunks of valid chars or quotes as we find them:
current: chunk: none
quoted-value: [
qmark (current: copy "")
any [
copy chunk some quoted-chars (append current chunk)
|
qmark qmark (append current qmark)
]
qmark
]
value: [
copy current some value-chars
| quoted-value
]
emit-value: [
(
delimiter: comma
append current-row current
)
]
emit-none: [
(
delimiter: comma
append current-row none
)
]
Note that delimiter
is set to ending
at the beginning of each row, then changed to comma
as soon as we pass a value. Thus, an input row is defined as [ending value any [comma value]]
.
All that remains is to define the document structure:
current-row: none
row-feed: [
(
delimiter: ending
append/only out current-row: copy []
)
]
width: none
header: [
(out: copy [])
row-feed any [
value comma
emit-value
]
value body: ending :body
emit-value
(width: length? current-row)
]
row: [
row-feed width [
delimiter [
value emit-value
| emit-none
]
]
]
if parse/all stream [header some row opt ending][out]
Wrap it up to shield all those words, and you have:
REBOL [
Title: "CSV Parser"
Date: 19-Nov-2012
Author: "Christopher Ross-Gill"
]
parse-csv: use [
comma ending delimiter value-chars qmark quoted-chars
value quoted-value header row
row-feed emit-value emit-none
out current current-row width
][
comma: ","
ending: "^/"
qmark: {"}
value-chars: complement charset reduce [qmark comma ending]
quoted-chars: complement charset reduce [qmark]
current: none
quoted-value: use [chunk][
[
qmark (current: copy "")
any [
copy chunk some quoted-chars (append current chunk)
|
qmark qmark (append current qmark)
]
qmark
]
]
value: [
copy current some value-chars
| quoted-value
]
current-row: none
row-feed: [
(
delimiter: ending
append/only out current-row: copy []
)
]
emit-value: [
(
delimiter: comma
append current-row current
)
]
emit-none: [
(
delimiter: comma
append current-row none
)
]
width: none
header: [
(out: copy [])
row-feed any [
value comma
emit-value
]
value body: ending :body
emit-value
(width: length? current-row)
]
row: [
opt ending end break
|
row-feed width [
delimiter [
value emit-value
| emit-none
]
]
]
func [stream [string!]][
if parse/all stream [header some row][out]
]
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With