Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to represent Higher-Order Abstract Syntax in Rust?

In Haskell, it is very easy to write algebraic data types (ADTs) with functions. This allows us to write interpreters that rely on native functions for substitutions, i.e., a higher-order abstract syntax (HOAS), which is known to be very efficient. For example, this is a simple λ-calculus interpreter using that technique:

data Term
  = Hol Term
  | Var Int
  | Lam (Term -> Term)
  | App Term Term

pretty :: Term -> String
pretty = go 0 where
  go lvl term = case term of
    Hol hol     -> go lvl hol 
    Var idx     -> "x" ++ show idx
    Lam bod     -> "λx" ++ show lvl ++ ". " ++ go (lvl+1) (bod (Hol (Var lvl)))
    App fun arg -> "(" ++ go lvl fun ++ " " ++ go lvl arg ++ ")"

reduce :: Term -> Term
reduce (Hol hol)     = hol
reduce (Var idx)     = Var idx
reduce (Lam bod)     = Lam (\v -> reduce (bod v))
reduce (App fun arg) = case reduce fun of
  Hol fhol      -> App (Hol fhol) (reduce arg)
  Var fidx      -> App (Var fidx) (reduce arg)
  Lam fbod      -> fbod (reduce arg)
  App ffun farg -> App (App ffun farg) (reduce arg)

main :: IO ()
main
  = putStrLn . pretty . reduce
  $ App
    (Lam$ \x -> App x x)
    (Lam$ \s -> Lam$ \z -> App s (App s (App s z)))

Notice how native functions were used rather than de Bruijn indices. That makes the interpreter considerably faster than it would be if we substituted applications manually.

I'm aware Rust has closures and many Fn() types, but I'm not sure they work exactly like Haskell closures in this situation, much less how to express that program given the low-level nature of Rust. Is it possible to represent HOAS in Rust? How would the Term datatype be represented?

like image 516
MaiaVictor Avatar asked Jul 05 '18 02:07

MaiaVictor


People also ask

Does rust have higher order functions?

Rust provides Higher Order Functions (HOF). These are functions that take one or more functions and/or produce a more useful function. HOFs and lazy iterators give Rust its functional flavor.

What is high order abstraction?

In computer science, higher-order abstract syntax (abbreviated HOAS) is a technique for the representation of abstract syntax trees for languages with variable binders.


2 Answers

As a fan of lambda calculus I decided to attempt this and it is indeed possible, though a bit less sightly than in Haskell (playground link):

use std::rc::Rc;
use Term::*;

#[derive(Clone)]
enum Term {
    Hol(Box<Term>),
    Var(usize),
    Lam(Rc<dyn Fn(Term) -> Term>),
    App(Box<Term>, Box<Term>),
}

impl Term {
    fn app(t1: Term, t2: Term) -> Self {
        App(Box::new(t1), Box::new(t2))
    }

    fn lam<F: Fn(Term) -> Term + 'static>(f: F) -> Self {
        Lam(Rc::new(f))
    }

    fn hol(t: Term) -> Self {
        Hol(Box::new(t))
    }
}

fn pretty(term: Term) -> String {
    fn go(lvl: usize, term: Term) -> String {
        match term {
            Hol(hol) => go(lvl, *hol),
            Var(idx) => format!("x{}", idx),
            Lam(bod) => format!("λx{}. {}", lvl, go(lvl + 1, bod(Term::hol(Var(lvl))))),
            App(fun, arg) => format!("({} {})", go(lvl, *fun), go(lvl, *arg)),
        }
    }

    go(0, term)
}

fn reduce(term: Term) -> Term {
    match term {
        Hol(hol) => *hol,
        Var(idx) => Var(idx),
        Lam(bod) => Term::lam(move |v| reduce(bod(v))),
        App(fun, arg) => match reduce(*fun) {
            Hol(fhol) => Term::app(Hol(fhol), reduce(*arg)),
            Var(fidx) => Term::app(Var(fidx), reduce(*arg)),
            Lam(fbod) => fbod(reduce(*arg)),
            App(ffun, farg) => Term::app(Term::app(*ffun, *farg), reduce(*arg)),
        },
    }
}

fn main() {
    // (λx. x x) (λs. λz. s (s (s z)))
    let term1 = Term::app(
        Term::lam(|x| Term::app(x.clone(), x.clone())), 
        Term::lam(|s| Term::lam(move |z| 
            Term::app(
                s.clone(),
                Term::app(
                    s.clone(),
                    Term::app(
                        s.clone(),
                        z.clone()
    ))))));

    // λb. λt. λf. b t f
    let term2 = Term::lam(|b| Term::lam(move |t| 
        Term::lam({
            let b = b.clone(); // necessary to satisfy the borrow checker
            move |f| Term::app(Term::app(b.clone(), t.clone()), f)
        })
    ));

    println!("{}", pretty(reduce(term1))); // λx0. λx1. (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 x1)))))))))))))))))))))))))))
    println!("{}", pretty(reduce(term2))); // λx0. λx1. λx2. ((x0 x1) x2)
}

Thanks to BurntSushi5 for the suggestion to use Rc that I always forget exists and to Shepmaster for suggesting to remove the unnecessary Box under Rc in Lam and how to satisfy the borrow checker in longer Lam chains.

like image 77
ljedrz Avatar answered Oct 19 '22 18:10

ljedrz


The accepted solution uses Rc to create a clonable heap allocated closure.

Technically speaking, this is not necessary as there is no runtime reference counting needed. All we need is a closure as a trait object and that is also clonable.

However, Rust 1.29.2 does not allow us to have things like dyn Clone + FnOnce(Term) -> Term, this restriction may be relaxed in the future. The restriction has two factors: Clone is not object safe (which is unlikely to be relaxed) and if we combine two traits together, one of them has to be an auto trait (this can be relaxed IMHO).

Whilst waiting for the language improvement, we can introduce a new trait to get around this:

// Combination of FnOnce(Term) -> Term and Clone
trait TermLam {
    // The FnOnce part, declared like an Fn, because we need object safety
    fn app(&self, t: Term) -> Term;
    // The Clone part, but we have to return sized objects 
    // (not Self either because of object safety), so it is in a box
    fn clone_box(&self) -> Box<dyn TermLam>;
}

// Blanket implementation for appropriate types
impl<F> TermLam for F
where
    F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term
{
    // Note: when you have a Clone + FnOnce, you effectively have an Fn
    fn app(&self, t: Term) -> Term {
        (self.clone())(t)
    }

    fn clone_box(&self) -> Box<dyn TermLam> {
        Box::new(self.clone())
    }
}

// We can now clone the box
impl Clone for Box<dyn TermLam> {
    fn clone(&self) -> Self {
        self.clone_box()
    }
}

Then we can remove the need to use Rc.

#[derive(Clone)]
enum Term {
    Hol(Box<Term>),
    Var(usize),
    Lam(Box<dyn TermLam>),
    App(Box<Term>, Box<Term>),
}

impl Term {
    fn app(t1: Term, t2: Term) -> Self {
        App(Box::new(t1), Box::new(t2))
    }

    fn lam<F>(f: F) -> Self
    where
       F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term        
    {
        Lam(Box::new(f))
    }

    fn hol(t: Term) -> Self {
        Hol(Box::new(t))
    }
}

fn pretty(term: Term) -> String {
    fn go(lvl: usize, term: Term) -> String {
        match term {
            Hol(hol) => go(lvl, *hol),
            Var(idx) => format!("x{}", idx),
            Lam(bod) => format!("λx{}. {}", lvl, go(lvl + 1, bod.app(Term::hol(Var(lvl))))),
            App(fun, arg) => format!("({} {})", go(lvl, *fun), go(lvl, *arg)),
        }
    }

    go(0, term)
}

fn reduce(term: Term) -> Term {
    match term {
        Hol(hol) => *hol,
        Var(idx) => Var(idx),
        Lam(bod) => Term::lam(move |v| reduce(bod.app(v))),
        App(fun, arg) => match reduce(*fun) {
            Hol(fhol) => Term::app(Hol(fhol), reduce(*arg)),
            Var(fidx) => Term::app(Var(fidx), reduce(*arg)),
            Lam(fbod) => fbod.app(reduce(*arg)),
            App(ffun, farg) => Term::app(Term::app(*ffun, *farg), reduce(*arg)),
        },
    }
}

fn main() {
    // (λx. x x) (λs. λz. s (s (s z)))
    let term1 = Term::app(
        Term::lam(|x| Term::app(x.clone(), x.clone())),
        Term::lam(|s| {
            Term::lam(move |z| {
                Term::app(
                    s.clone(),
                    Term::app(s.clone(), Term::app(s.clone(), z.clone())),
                )
            })
        }),
    );

    // λb. λt. λf. b t f
    let term2 = Term::lam(|b| {
        Term::lam(move |t| {
            Term::lam({
                //let b = b.clone(); No longer necessary for Rust 1.29.2
                move |f| Term::app(Term::app(b.clone(), t.clone()), f)
            })
        })
    });

    println!("{}", pretty(reduce(term1))); // λx0. λx1. (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 x1)))))))))))))))))))))))))))
    println!("{}", pretty(reduce(term2))); // λx0. λx1. λx2. ((x0 x1) x2)
}

This was the original way that the other answer attempted, which the author was unable to resolve.

Optimize!

Rust is known to be achieve performance without sacrificing safety. The implementation above however, always passes Terms by value and have many unnecessary clone calls, so some optimizations can be done.

Also, the standard way to stringify a piece of Rust data is to use the Display trait. So let's make those right!

use std::fmt::{Display, Error, Formatter};
use Term::*;

// Combination of FnOnce(Term) -> Term and Clone
trait TermLam {
    // The FnOnce part, declared like an Fn, because we need object safety
    fn app(&self, t: Term) -> Term;
    // The Clone part, but we have to return sized objects
    // (not Self either because of object safety), so it is in a box
    fn clone_box(&self) -> Box<dyn TermLam>;
}

// Blanket implementation for appropriate types
impl<F> TermLam for F
where
    F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term,
{
    // Note: when you have a Clone + FnOnce, you effectively have an Fn
    fn app(&self, t: Term) -> Term {
        (self.clone())(t)
    }

    fn clone_box(&self) -> Box<dyn TermLam> {
        Box::new(self.clone())
    }
}

// We can now clone the box
impl Clone for Box<dyn TermLam> {
    fn clone(&self) -> Self {
        self.clone_box()
    }
}

#[derive(Clone)]
enum Term {
    Hol(Box<Term>),
    Var(usize),
    Lam(Box<dyn TermLam>),
    App(Box<Term>, Box<Term>),
}

impl Term {
    fn app(t1: Term, t2: Term) -> Self {
        App(Box::new(t1), Box::new(t2))
    }

    fn lam<F>(f: F) -> Self
    where
        F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term,
    {
        Lam(Box::new(f))
    }

    fn hol(t: Term) -> Self {
        Hol(Box::new(t))
    }

    // `reduce` is now a by-reference method
    fn reduce(&self) -> Term {
        match self {
            Hol(_) => self.clone(),
            Var(_) => self.clone(),
            Lam(bod) => {
                let bod = bod.clone();
                Term::lam(move |v| bod.app(v).reduce())
            },
            // We reuse the reduced object when possible,
            // to avoid unnecessary clone.
            App(fun, arg) => match fun.reduce() {
                other @ Hol(_) => Term::app(other, arg.reduce()),
                other @ Var(_) => Term::app(other, arg.reduce()),
                Lam(fbod) => fbod.app(arg.reduce()),
                other @ App(_, _) => Term::app(other, arg.reduce()),
            },
        }
    }
}
//The standard way of `pretty` is `Display`
impl Display for Term {
    fn fmt(&self, fmt: &mut Formatter) -> Result<(), Error> {
        // As the API is different from `pretty`, the way we do recursion is
        // a bit different as well
        struct LvlTerm<'a>(usize, &'a Term);
        impl<'a> Display for LvlTerm<'a> {
            fn fmt(&self, fmt: &mut Formatter) -> Result<(), Error> {
                match self {
                    LvlTerm(lvl, Hol(hol)) => write!(fmt, "{}", LvlTerm(*lvl, hol)),
                    LvlTerm(_, Var(idx)) => write!(fmt, "x{}", idx),
                    LvlTerm(lvl, Lam(bod)) => write!(
                        fmt,
                        "λx{}. {}",
                        *lvl,
                        LvlTerm(*lvl + 1, &bod.app(Term::hol(Var(*lvl))))
                    ),
                    LvlTerm(lvl, App(fun, arg)) => {
                        write!(fmt, "({} {})", LvlTerm(*lvl, fun), LvlTerm(*lvl, arg))
                    }
                }
            }
        }
        write!(fmt, "{}", LvlTerm(0, self))
    }
}

fn main() {
    // In general, if you need to use a value n+1 times, you need to
    // call clone it n times. You don't have to clone it in the last use.
    // (λx. x x) (λs. λz. s (s (s z)))
    let term1 = Term::app(
        Term::lam(|x| Term::app(x.clone(), x)),
        Term::lam(|s| {
            Term::lam(move |z| Term::app(s.clone(), Term::app(s.clone(), Term::app(s, z))))
        }),
    );

    // No clone is required if all values are used exactly once.
    // λb. λt. λf. b t f
    let term2 =
        Term::lam(|b| Term::lam(move |t| Term::lam(move |f| Term::app(Term::app(b, t), f))));

    println!("{}", term1.reduce()); // λx0. λx1. (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 x1)))))))))))))))))))))))))))
    println!("{}", term2.reduce()); // λx0. λx1. λx2. ((x0 x1) x2)
}

Playground

We can see that, the code above can be further simplified: as the match arms in reduce have code duplication, we can collapse them together. However, as the purpose of this answer is to demonstrate doing things the SAME way the Haskell code in the question does, so this was left as it is.

Also, by only requiring the closures to be FnOnce, not Fn, we relaxed in the usage site the requirement of cloning all variables to only when it is use more than once. The trade off is that whenever the closure was called, all variables it captures will be cloned. It is hard to tell which one is better until profiling on it; so I just pick the one that makes the code looks better.

Again, Rust makes those decisions explicit and keeps the differences of choices in you mind, this is good!

like image 5
Earth Engine Avatar answered Oct 19 '22 16:10

Earth Engine