I'm trying to follow Appel's "Modern Compiler Implementation in ML" and am writing the lexer using Ocamllex.
The specification asks for the lexer to return strings after translating escape sequences. The following code is an excerpt from the ocamllex input file:
rule tiger = parse
...
| '"'
{ let buffer = Buffer.create 1 in
STRING (stringl buffer lexbuf)
}
and stringl buffer = parse
| '"' { Buffer.contents buffer }
| "\\t" { Buffer.add_char buffer '\t'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| '\\' '"' { Buffer.add_char buffer '"'; stringl buffer lexbuf }
| '\\' '\\' { Buffer.add_char buffer '\\'; stringl buffer lexbuf }
| eof { raise End_of_file }
| _ as char { Buffer.add_char buffer char; stringl buffer lexbuf }
Is there a better way?
You may be interested in looking at how the Ocaml lexer does this (search for and string
). In essence, it's the same method as yours, without the nice local buffer (I find your code nicer on this point, but this is a bit less efficient), a bit more complex because more escaping is supported, and using an escape table (char_for_backslash) to factorize similar rules.
Also, you have the rule "\\n"
repeated twice, and I think 1
is a very pessimistic estimate of your string length, I would rather use 20
here (to avoid needless resizing).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With