Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to pattern match infix operations with precedence in Rust macros?

Tags:

macros

rust

A very simple example would be to implement basic addition and multiplication in Rust macros.

compute!(1 + 2 * 3) // should evaluate to 7

I'm not completely sure it's possible due to Rust macro's limited grammar.

The point here is not to compute something at compile-time, but to be able to somehow parse the tokens (with precedence):

(term, terms*) => { parse_mul!(term) + (parse_mul!(terms))* } // this is not actual Rust!
like image 431
dragostis Avatar asked Apr 19 '16 14:04

dragostis


2 Answers

In theory, you can do it. In practice, it's a bad idea. I did it anyway. I posted this on reddit and was requested to transfer it here.

Such a macro would necessarily be a "tt muncher", a macro which recurses to itself to parse one token of its input at a time. This is required because as pointed out in the comments above, it's the only way to pull apart an expression like a + b. These so-called "future-proofing restrictions" are in place for good reason, and tt munchers circumvent them. The recursion also means that the time to expand the macro is linear in the length of the expression, at least. And rustc will give up expanding macros after recursing 64 times, by default (but you can change the limit on stable).

With those caveats in mind, let's see the macro! The strategy I have chosen is to transform the infix expression into postfix, and then evaluate the postfix expression, which is fairly simple to do. I very vaguely remember how to do this but since the objective here is macro craziness, not algorithmic tricks, I just followed the rules at the bottom of this helpful page.

Without further ado, the code (runnable version):

macro_rules! infix {
    // done converting
    (@cvt () $postfix:tt) => { infix!(@pfx () $postfix) };
    //                                |    |  ^ postfix expression
    //                                |    ^ operand stack
    //                                ^ postfix interpreter

    // infix to postfix conversion using the rules at the bottom of this page: http://csis.pace.edu/~wolf/CS122/infix-postfix.htm

    // at end of input, flush the operators to postfix
    (@cvt ($ophead:tt $($optail:tt)*) ($($postfix:tt)*)) => { infix!(@cvt ($($optail)*) ($($postfix)* $ophead)) };

    // 2. push an operator onto the stack if it's empty or has a left-paren on top
    (@cvt (                 ) $postfix:tt + $($tail:tt)*) => { infix!(@cvt (+               ) $postfix $($tail)*) };
    (@cvt (                 ) $postfix:tt - $($tail:tt)*) => { infix!(@cvt (-               ) $postfix $($tail)*) };
    (@cvt (                 ) $postfix:tt * $($tail:tt)*) => { infix!(@cvt (*               ) $postfix $($tail)*) };
    (@cvt (                 ) $postfix:tt / $($tail:tt)*) => { infix!(@cvt (/               ) $postfix $($tail)*) };
    (@cvt (LP $($optail:tt)*) $postfix:tt + $($tail:tt)*) => { infix!(@cvt (+ LP $($optail)*) $postfix $($tail)*) };
    (@cvt (LP $($optail:tt)*) $postfix:tt - $($tail:tt)*) => { infix!(@cvt (- LP $($optail)*) $postfix $($tail)*) };
    (@cvt (LP $($optail:tt)*) $postfix:tt * $($tail:tt)*) => { infix!(@cvt (* LP $($optail)*) $postfix $($tail)*) };
    (@cvt (LP $($optail:tt)*) $postfix:tt / $($tail:tt)*) => { infix!(@cvt (/ LP $($optail)*) $postfix $($tail)*) };

    // 3. push a left-paren onto the stack
    (@cvt ($($operator:tt)*) $postfix:tt ($($inner:tt)*) $($tail:tt)*) => { infix!(@cvt (LP $($operator)*) $postfix $($inner)* RP $($tail)*) };

    // 4. see right-paren, pop operators to postfix until left-paren
    (@cvt (LP         $($optail:tt)*) $postfix:tt       RP $($tail:tt)*) => { infix!(@cvt ($($optail)*) $postfix               $($tail)*   ) };
    (@cvt ($ophead:tt $($optail:tt)*) ($($postfix:tt)*) RP $($tail:tt)*) => { infix!(@cvt ($($optail)*) ($($postfix)* $ophead) RP $($tail)*) };

    // 5. if an operator w/ lower precedence is on top, just push
    (@cvt (+ $($optail:tt)*) $postfix:tt * $($tail:tt)*) => { infix!(@cvt (* + $($optail)*) $postfix $($tail)*) };
    (@cvt (- $($optail:tt)*) $postfix:tt * $($tail:tt)*) => { infix!(@cvt (* - $($optail)*) $postfix $($tail)*) };
    (@cvt (+ $($optail:tt)*) $postfix:tt / $($tail:tt)*) => { infix!(@cvt (/ + $($optail)*) $postfix $($tail)*) };
    (@cvt (- $($optail:tt)*) $postfix:tt / $($tail:tt)*) => { infix!(@cvt (/ - $($optail)*) $postfix $($tail)*) };

    // 6. if an operator w/ equal precedence is on top, pop and push
    (@cvt (+ $($optail:tt)*) ($($postfix:tt)*) + $($tail:tt)*) => { infix!(@cvt (+ $($optail)*) ($($postfix)* +) $($tail)*) };
    (@cvt (- $($optail:tt)*) ($($postfix:tt)*) - $($tail:tt)*) => { infix!(@cvt (- $($optail)*) ($($postfix)* -) $($tail)*) };
    (@cvt (+ $($optail:tt)*) ($($postfix:tt)*) - $($tail:tt)*) => { infix!(@cvt (- $($optail)*) ($($postfix)* +) $($tail)*) };
    (@cvt (- $($optail:tt)*) ($($postfix:tt)*) + $($tail:tt)*) => { infix!(@cvt (+ $($optail)*) ($($postfix)* -) $($tail)*) };
    (@cvt (* $($optail:tt)*) ($($postfix:tt)*) * $($tail:tt)*) => { infix!(@cvt (* $($optail)*) ($($postfix)* *) $($tail)*) };
    (@cvt (/ $($optail:tt)*) ($($postfix:tt)*) / $($tail:tt)*) => { infix!(@cvt (/ $($optail)*) ($($postfix)* /) $($tail)*) };
    (@cvt (* $($optail:tt)*) ($($postfix:tt)*) / $($tail:tt)*) => { infix!(@cvt (/ $($optail)*) ($($postfix)* *) $($tail)*) };
    (@cvt (/ $($optail:tt)*) ($($postfix:tt)*) * $($tail:tt)*) => { infix!(@cvt (* $($optail)*) ($($postfix)* /) $($tail)*) };

    // 7. if an operator w/ higher precedence is on top, pop it to postfix
    (@cvt (* $($optail:tt)*) ($($postfix:tt)*) + $($tail:tt)*) => { infix!(@cvt ($($optail)*) ($($postfix)* *) + $($tail)*) };
    (@cvt (* $($optail:tt)*) ($($postfix:tt)*) - $($tail:tt)*) => { infix!(@cvt ($($optail)*) ($($postfix)* *) - $($tail)*) };
    (@cvt (/ $($optail:tt)*) ($($postfix:tt)*) + $($tail:tt)*) => { infix!(@cvt ($($optail)*) ($($postfix)* /) + $($tail)*) };
    (@cvt (/ $($optail:tt)*) ($($postfix:tt)*) - $($tail:tt)*) => { infix!(@cvt ($($optail)*) ($($postfix)* /) - $($tail)*) };

    // 1. operands go to the postfix output
    (@cvt $operators:tt ($($postfix:tt)*) $head:tt $($tail:tt)*) => { infix!(@cvt $operators ($($postfix)* ($head)) $($tail)*) };

    // postfix interpreter
    (@pfx ($result:expr                     ) (                     )) => { $result };
    (@pfx (($a:expr) ($b:expr) $($stack:tt)*) (+        $($tail:tt)*)) => { infix!(@pfx ((($b + $a)) $($stack)*) ($($tail)*)) };
    (@pfx (($a:expr) ($b:expr) $($stack:tt)*) (-        $($tail:tt)*)) => { infix!(@pfx ((($b - $a)) $($stack)*) ($($tail)*)) };
    (@pfx (($a:expr) ($b:expr) $($stack:tt)*) (*        $($tail:tt)*)) => { infix!(@pfx ((($b * $a)) $($stack)*) ($($tail)*)) };
    (@pfx (($a:expr) ($b:expr) $($stack:tt)*) (/        $($tail:tt)*)) => { infix!(@pfx ((($b / $a)) $($stack)*) ($($tail)*)) };
    (@pfx ($($stack:tt)*                    ) ($head:tt $($tail:tt)*)) => { infix!(@pfx ($head       $($stack)*) ($($tail)*)) };

    ($($t:tt)*) => { infix!(@cvt () () $($t)*) }
    //                      |    |  |  ^ infix expression
    //                      |    |  ^ postfix expression
    //                      |    ^ operator stack
    //                      ^ convert infix to postfix
}

fn main() {
    println!("{}", infix!(1 + 2 * 3));
    println!("{}", infix!(1 * 2 + 3));
    println!("{}", infix!(((1 + 2) * 3) * 3));
    println!("{}", infix!(( 1 + 2  * 3) * 3));
    println!("{}", infix!(1 - 2 - 1));
}

Most of the macro tricks I used here can be found in The Little Book of Rust Macros. You can see that the macro is divided into three sections: the infix-to-postfix conversion (all rules that start with @cvt), the postfix interpreter (all rules that start with @pfx), and the single entry point (the last rule, with no prefix).

The converter uses an operator stack and builds up the postfix output string as it chews through the input. Parentheses are converted to LP and RP to keep the input as a linear stream of tokens (normally macro_rules requires parentheses to stay balanced and matches a parenthesized group as a single token-tree). All operators are considered to be right-associative, and PEMDAS applies (* and / have precedence over + and -).

The interpreter uses an operand stack and evaluates the expression in a rather straightforward way: push operands onto the stack, and when encountering an operator pop off two operands and apply the operator. The result of the postfix interpreter is an expression quite similar to the original infix expression, but with everything parenthesized to simulate operator precedence. We then rely on rustc to do the actual arithmetic :)

A few examples are included at the end of the code. Let me know if you find any bugs! One limitation is that each operand must be a single token-tree, so input like 5.0f32.sqrt() will cause a parse error and multi-token literals like -2 might cause wrong answers. You can fix this with curly braces, e.g. infix!({-2.0} - {5.0f32.sqrt()}) (it could be fixed by complicating the macro, too).

like image 111
durka42 Avatar answered Nov 16 '22 03:11

durka42


There are serious limitations to what you can do with macros. E.g. you cannot have parsing ambiguities. So you can't have an expression that expects a + after it. This means we need to separate our parsing tokens by e.g. a comma. Then we need to specify the basic binary operations. And finally a mapping from infix to infix with brackets or to prefix. An example using the infix to infix with brackets method is:

macro_rules! compute {
    ($a:expr, +, $b:expr) => {{ add($a, $b) }};
    ($a:expr, *, $b:expr) => {{ mul($a, $b) }};
    ($a:expr, +, $($rest:tt)*) => {{
        compute!($a, +, compute!($($rest)*))
    }};
    ($a:expr, *, $b:expr, $($rest:tt)*) => {{
        compute!(compute!($a, *, $b), $($rest)*)
    }};
}

Playground

You can now call this macro almost like in your question: compute!(1, +, 2, *, 3)

like image 41
oli_obk Avatar answered Nov 16 '22 04:11

oli_obk