I'm trying to write a proc macro block! that parses syntax vaguely resembling jsx/rsx using syn crate.
The following should be parsed into a block "wrapper" with two children blocks "text_1" and "text_2":
let block = block! {
<"wrapper">
<"text_1">("hello")
<"text_2">("world")
};
Using input.parse::<syn::Token![]>() when implementing syn::parse I can parse everything alright.
However there's one thing that I can't find a solution to: I want to enforce the style of code for wrapper-children relation so that if some block is nested it has to have double space or a tabulation before it e.g.
// -- Good --
let block = block! {
<"text_1">("hello")
<"text_2">("world")
};
// -- Good --
let block = block! {
<"wrapper">
<"text_1">("hello")
<"text_2">("world")
};
// -- Bad --
// Need spaces/tabs before the children tags
let block = block! {
<"wrapper">
<"text_1">("hello")
<"text_2">("world")
};
Parsing TokenStream by Token![<], parse::<LitStr>(), Token![>], parenthesized! and bracketed! doesn't enforce intended rules, but I can't find anything related to parsing tabs/spaces in syn docs.
Judging by TokenStream printed in console - it doesn't even have anything between wrapper's > and < of the first child
Input: TokenStream [
Punct {
ch: '<',
spacing: Alone,
span: #0 bytes(118..119),
},
Literal {
kind: Str,
symbol: "wrapper",
suffix: None,
span: #0 bytes(119..128),
},
Punct {
ch: '>',
spacing: Alone,
span: #0 bytes(128..129),
},
Punct {
ch: '<',
spacing: Alone,
span: #0 bytes(144..145),
},
Literal {
kind: Str,
symbol: "text_1",
suffix: None,
span: #0 bytes(145..153),
},
Punct {
ch: '>',
spacing: Alone,
span: #0 bytes(153..154),
},
Any advise on how it's possible? If possible at all
Indentation is not significant in Rust's syntax and whitespace is only required occasionaly to separate what would be a single token into two tokens (think let name). So because its only functional purpose is to help parse tokens, whitespace is not itself a token.
If your desired macro syntax relies on indentation, then you should use a string literal instead:
let block = block!(r#"
<"wrapper">
<"text_1">("hello")
<"text_2">("world")
"#);
However, there is a hacky way to get what you want since Spans associated with tokens have a .source_text() method. Here's a simple macro that takes the source and turns it into a literal:
use proc_macro::{Literal, TokenStream, TokenTree};
#[proc_macro]
pub fn block(tokens: TokenStream) -> TokenStream {
let source = tokens
.into_iter()
.next()
.unwrap()
.span()
.source_text()
.unwrap();
TokenTree::Literal(Literal::string(&source)).into()
}
use macros::block;
fn main() {
let block = block! { {
<"wrapper">
<"text_1">("hello")
<"text_2">("world")
} };
dbg!(block);
}
[src/main.rs:10] block = "{\n <\"wrapper\">\n <\"text_1\">(\"hello\")\n <\"text_2\">(\"world\")\n }"
You could then parse it by lines and compare the leading whitespace within yuour procedural macro. Do notice though that I added an extra { } since a TokenStream itself doesn't have an encompassing .span() available.
You should also heed the documentation that this is not an intended use-case:
Returns the source text behind a span. This preserves the original source code, including spaces and comments. It only returns a result if the span corresponds to real source code.
Note: The observable result of a macro should only rely on the tokens and not on this source text. The result of this function is a best effort to be used for diagnostics only.
I do not encourage you to actually do this. And as much good intentions I'm sure you have, a macro should not be enforcing style.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With