The POSIX shell command language is not easy to parse, largely because of tight coupling between lexing and parsing.
However, parsing expression grammars (PEGs) are often scannerless. By combining lexing and parsing, it seems that I could avoid these problems. The language that I am using (Rust) has a well-maintained PEG library. However, I know of three difficulties that could make it impractical to use this library:
Is a PEG suited to parsing the shell command language given these requirements, or is a hand-written recursive-descent parser more suitable?
Yes, a PEG can be used, and none of the issues you note should be a problem. In particular:
1) parsing line by line: most PEG tools will not have any built-in white-space skipping. All white space including newlines must be explicitly handled by you, which means you can handle newline any way you like.
2) You should not use the parse tree from PEG as your AST. Instead you should descend the parse tree and build an AST. For aliases then, after the parse has completed and you're building your AST, you can detect the alias and insert the appropriate expansion for the alias instead.
3) Reserved words are not reserved unless you reserve them. That is, if you have a context where either a reserved word or another alphanumeric symbol can occur, you must first check for the reserved words explicitly, then the arbitrary alphanumeric symbol, because once the PEG decides it has a match, that will not back-track. Anywhere a reserved word is not permitted, simply don't check for it, and your generalised alphanumeric symbol rule will succeed instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With