I'm writing a parser for a text-based format in nom 4.2.2, and I'm using the whitespace facility to skip whitespace. I have to use a custom parser because this format treats some unusual characters as whitespace. Following the example on that page, I've made one using eat_separator
.
How do I efficiently extend my space parser to also consume line comments from #
to end-of-line? These comments can appear anywhere except within strings. I always want to throw away the contents of the comment: there's nothing like pre-processor directives.
That's a tricky issue; I had it as well when writing a Python parser.
Here is how I ended up implementing "line break optionally preceded by a comment":
named!(pub newline<StrSpan, ()>,
map!(
many1!(
tuple!(
spaces_nonl,
opt!(preceded!(char!('#'), many0!(none_of!("\n")))),
char!('\n')
)
),
|_| ()
)
);
named!(pub spaces_nl<StrSpan, ()>,
map!(many0!(alt!(one_of!(" \t\x0c") => { |_|() } | escaped_newline | newline)), |_| ())
);
named!(pub spaces_nonl<StrSpan, ()>,
map!(many0!(alt!(one_of!(" \t\x0c") => { |_| () }|escaped_newline)), |_| ())
);
Which you can then use to rewrite ws!
to use this new function (I copy-pasted the code from nom and replaced the name of the argument of sep!
):
/// Like `ws!()`, but ignores comments as well
macro_rules! ws_comm (
($i:expr, $($args:tt)*) => (
{
use nom::Convert;
use nom::Err;
match sep!($i, spaces_nl, $($args)*) {
Err(e) => Err(e),
Ok((i1,o)) => {
match spaces_nl(i1) {
Err(e) => Err(Err::convert(e)),
Ok((i2,_)) => Ok((i2, o))
}
}
}
}
)
);
Related code, in case you are curious: https://github.com/ProgVal/rust-python-parser/blob/1e03122f030e183096d7d3271907106678036f56/src/helpers.rs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With