Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace one identifier in an expression with another one via Rust macro?

I'm trying to build a macro that does some code transformation, and should be able to parse its own syntax. Here is the simplest example I can think of:

replace!(x, y, x * 100 + z) ~> y * 100 + z

This macro should be able to replace the first identifier with the second in the expression provided as third parameter. The macro should have some understanding of the language of the third parameter (which in my particular case, as opposed to the example, wouldn't parse in Rust) and apply recursively over it.

What's the most effective way to build such a macro in Rust? I'm aware of the proc_macro approach and the macro_rules! one. However I am not sure whether macro_rules! is powerful enough to handle this and I couldn't find much documentation in how to build my own transformations using proc_macro. Can anyone point me in the right direction?

like image 322
hoheinzollern Avatar asked May 02 '19 11:05

hoheinzollern


People also ask

What does tt mean in Rust?

I'm reading a book about Rust, and start playing with Rust macros. All metavariable types are explained there and have examples, except the last one – tt . According to the book, it is a “a single token tree”.

Are macros allowed in Rust?

The most widely used form of macros in Rust is the declarative macro. These are also sometimes referred to as “macros by example,” “ macro_rules! macros,” or just plain “macros.” At their core, declarative macros allow you to write something similar to a Rust match expression.

What is Macro_use?

By default macros are scoped to their module, but #[macro_use] will let it escape to the parent context, including all submodules from there. See Macros By Example - The Rust Reference. Yandros July 17, 2021, 12:10pm #3.

What is derive macro in Rust?

Derive macros These macros can create new items given the token stream of a struct, enum, or union. They can also define derive macro helper attributes. Custom derive macros are defined by a public function with the proc_macro_derive attribute and a signature of (TokenStream) -> TokenStream .


1 Answers

Solution with macro_rules! macro

To implement this with declarative macros (macro_rules!) is a bit tricky but possible. However, it's necessary to use a few tricks.

But first, here is the code (Playground):

macro_rules! replace {
    // This is the "public interface". The only thing we do here is to delegate
    // to the actual implementation. The implementation is more complicated to
    // call, because it has an "out" parameter which accumulates the token we
    // will generate.
    ($x:ident, $y:ident, $($e:tt)*) => {
        replace!(@impl $x, $y, [], $($e)*)
    };

    // Recursion stop: if there are no tokens to check anymore, we just emit
    // what we accumulated in the out parameter so far.
    (@impl $x:ident, $y:ident, [$($out:tt)*], ) => {
        $($out)*
    };

    // This is the arm that's used when the first token in the stream is an
    // identifier. We potentially replace the identifier and push it to the
    // out tokens.
    (@impl $x:ident, $y:ident, [$($out:tt)*], $head:ident $($tail:tt)*) => {{
        replace!(
            @impl $x, $y, 
            [$($out)* replace!(@replace $x $y $head)],
            $($tail)*
        )
    }};

    // These arms are here to recurse into "groups" (tokens inside of a 
    // (), [] or {} pair)
    (@impl $x:ident, $y:ident, [$($out:tt)*], ( $($head:tt)* ) $($tail:tt)*) => {{
        replace!(
            @impl $x, $y, 
            [$($out)* ( replace!($x, $y, $($head)*) ) ], 
            $($tail)*
        )
    }};
    (@impl $x:ident, $y:ident, [$($out:tt)*], [ $($head:tt)* ] $($tail:tt)*) => {{
        replace!(
            @impl $x, $y, 
            [$($out)* [ replace!($x, $y, $($head)*) ] ], 
            $($tail)*
        )
    }};
    (@impl $x:ident, $y:ident, [$($out:tt)*], { $($head:tt)* } $($tail:tt)*) => {{
        replace!(
            @impl $x, $y, 
            [$($out)* { replace!($x, $y, $($head)*) } ], 
            $($tail)*
        )
    }};

    // This is the standard recusion case: we have a non-identifier token as
    // head, so we just put it into the out parameter.
    (@impl $x:ident, $y:ident, [$($out:tt)*], $head:tt $($tail:tt)*) => {{
        replace!(@impl $x, $y, [$($out)* $head], $($tail)*)
    }};

    // Helper to replace the identifier if its the needle. 
    (@replace $needle:ident $replacement:ident $i:ident) => {{
        // This is a trick to check two identifiers for equality. Note that 
        // the patterns in this macro don't contain any meta variables (the 
        // out meta variables $needle and $i are interpolated).
        macro_rules! __inner_helper {
            // Identifiers equal, emit $replacement
            ($needle $needle) => { $replacement };
            // Identifiers not equal, emit original
            ($needle $i) => { $i };                
        }

        __inner_helper!($needle $i)
    }}
}


fn main() {
    let foo = 3;
    let bar = 7;
    let z = 5;

    dbg!(replace!(abc, foo, bar * 100 + z));  // no replacement
    dbg!(replace!(bar, foo, bar * 100 + z));  // replace `bar` with `foo`
}

It outputs:

[src/main.rs:56] replace!(abc , foo , bar * 100 + z) = 705
[src/main.rs:57] replace!(bar , foo , bar * 100 + z) = 305

How does this work?

There are two main tricks one need to understand before understanding this macro: push down accumulation and how to check two identifiers for equality.

Furthermore, just to be sure: the @foobar things at the start of the macro pattern are not a special feature, but simply a convention to mark internal helper macros (also see: "The little book of Macros", StackOverflow question).


Push down accumulation is well described in this chapter of "The little book of Rust macros". The important part is:

All macros in Rust must result in a complete, supported syntax element (such as an expression, item, etc.). This means that it is impossible to have a macro expand to a partial construct.

But often it is necessary to have partial results, for example when dealing token for token with some input. To solve this, one basically has an "out" parameter which is just a list of tokens that grows with each recursive macro call. This works, because macro input can be arbitrary tokens and don't have to be a valid Rust construct.

This pattern only makes sense for macros that work as "incremental TT munchers", which my solution does. There is also a chapter about this pattern in TLBORM.


The second key point is to check two identifiers for equality. This is done with an interesting trick: the macro defines a new macro which is then immediately used. Let's take a look at the code:

(@replace $needle:ident $replacement:ident $i:ident) => {{
    macro_rules! __inner_helper {
        ($needle $needle) => { $replacement };
        ($needle $i) => { $i };                
    }

    __inner_helper!($needle $i)
}}

Let's go through two different invocations:

  • replace!(@replace foo bar baz): this expands to:

    macro_rules! __inner_helper {
        (foo foo) => { bar };
        (foo baz) => { baz };
    }
    
    __inner_helper!(foo baz)
    

    And the inner_helper! invocation now clearly takes the second pattern, resulting in baz.

  • replace!(@replace foo bar foo) on the other hand expands to:

    macro_rules! __inner_helper {
        (foo foo) => { bar };
        (foo foo) => { foo };
    }
    
    __inner_helper!(foo foo)
    

    This time, the inner_helper! invocation takes the first pattern, resulting in bar.

I learned this trick from a crate that offers basically only exactly that: a macro checking two identifiers for equality. But unfortunately, I cannot find this crate anymore. Let me know if you know the name of that crate!


This implementation has a few limitations, however:

  • As an incremental TT muncher, it recurses for each token in the input. So it's easy to reach the recursion limit (which can be increased, but it's not optimal). It could be possible to write a non-recursive version of this macro, but so far I haven't found a way to do that.

  • macro_rules! macros are a bit strange when it comes to identifiers. The solution presented above might behave strange with self as identifier. See this chapter for more information on that topic.


Solution with proc-macro

Of course this can also be done via a proc-macro. It also involves less strange tricks. My solution looks like this:

extern crate proc_macro;

use proc_macro::{
    Ident, TokenStream, TokenTree,
    token_stream,
};


#[proc_macro]
pub fn replace(input: TokenStream) -> TokenStream {
    let mut it = input.into_iter();

    // Get first parameters
    let needle = get_ident(&mut it);
    let _comma = it.next().unwrap();
    let replacement = get_ident(&mut it);
    let _comma = it.next().unwrap();

    // Return the remaining tokens, but replace identifiers.
    it.map(|tt| {
        match tt {
            // Comparing `Ident`s can only be done via string comparison right
            // now. Note that this ignores syntax contexts which can be a
            // problem in some situation.
            TokenTree::Ident(ref i) if i.to_string() == needle.to_string() => {
                TokenTree::Ident(replacement.clone())
            }

            // All other tokens are just forwarded
            other => other,
        }
    }).collect()
}

/// Extract an identifier from the iterator.
fn get_ident(it: &mut token_stream::IntoIter) -> Ident {
    match it.next() {
        Some(TokenTree::Ident(i)) => i,
        _ => panic!("oh noes!"),
    }
}

Using this proc macro with the main() example from above works exactly the same.

Note: error handling was ignored here to keep the example short. Please see this question on how to do error reporting in proc macros.

Apart from this, that code doesn't need as much explanations, I think. This proc macro version also doesn't suffer from the recursion limit problem as the macro_rules! macro.

like image 111
Lukas Kalbertodt Avatar answered Oct 25 '22 15:10

Lukas Kalbertodt