In Haskell, it is very easy to write algebraic data types (ADTs) with functions. This allows us to write interpreters that rely on native functions for substitutions, i.e., a higher-order abstract syntax (HOAS), which is known to be very efficient. For example, this is a simple λ-calculus interpreter using that technique:
data Term
= Hol Term
| Var Int
| Lam (Term -> Term)
| App Term Term
pretty :: Term -> String
pretty = go 0 where
go lvl term = case term of
Hol hol -> go lvl hol
Var idx -> "x" ++ show idx
Lam bod -> "λx" ++ show lvl ++ ". " ++ go (lvl+1) (bod (Hol (Var lvl)))
App fun arg -> "(" ++ go lvl fun ++ " " ++ go lvl arg ++ ")"
reduce :: Term -> Term
reduce (Hol hol) = hol
reduce (Var idx) = Var idx
reduce (Lam bod) = Lam (\v -> reduce (bod v))
reduce (App fun arg) = case reduce fun of
Hol fhol -> App (Hol fhol) (reduce arg)
Var fidx -> App (Var fidx) (reduce arg)
Lam fbod -> fbod (reduce arg)
App ffun farg -> App (App ffun farg) (reduce arg)
main :: IO ()
main
= putStrLn . pretty . reduce
$ App
(Lam$ \x -> App x x)
(Lam$ \s -> Lam$ \z -> App s (App s (App s z)))
Notice how native functions were used rather than de Bruijn indices. That makes the interpreter considerably faster than it would be if we substituted applications manually.
I'm aware Rust has closures and many Fn()
types, but I'm not sure they work exactly like Haskell closures in this situation, much less how to express that program given the low-level nature of Rust. Is it possible to represent HOAS in Rust? How would the Term
datatype be represented?
Rust provides Higher Order Functions (HOF). These are functions that take one or more functions and/or produce a more useful function. HOFs and lazy iterators give Rust its functional flavor.
In computer science, higher-order abstract syntax (abbreviated HOAS) is a technique for the representation of abstract syntax trees for languages with variable binders.
As a fan of lambda calculus I decided to attempt this and it is indeed possible, though a bit less sightly than in Haskell (playground link):
use std::rc::Rc;
use Term::*;
#[derive(Clone)]
enum Term {
Hol(Box<Term>),
Var(usize),
Lam(Rc<dyn Fn(Term) -> Term>),
App(Box<Term>, Box<Term>),
}
impl Term {
fn app(t1: Term, t2: Term) -> Self {
App(Box::new(t1), Box::new(t2))
}
fn lam<F: Fn(Term) -> Term + 'static>(f: F) -> Self {
Lam(Rc::new(f))
}
fn hol(t: Term) -> Self {
Hol(Box::new(t))
}
}
fn pretty(term: Term) -> String {
fn go(lvl: usize, term: Term) -> String {
match term {
Hol(hol) => go(lvl, *hol),
Var(idx) => format!("x{}", idx),
Lam(bod) => format!("λx{}. {}", lvl, go(lvl + 1, bod(Term::hol(Var(lvl))))),
App(fun, arg) => format!("({} {})", go(lvl, *fun), go(lvl, *arg)),
}
}
go(0, term)
}
fn reduce(term: Term) -> Term {
match term {
Hol(hol) => *hol,
Var(idx) => Var(idx),
Lam(bod) => Term::lam(move |v| reduce(bod(v))),
App(fun, arg) => match reduce(*fun) {
Hol(fhol) => Term::app(Hol(fhol), reduce(*arg)),
Var(fidx) => Term::app(Var(fidx), reduce(*arg)),
Lam(fbod) => fbod(reduce(*arg)),
App(ffun, farg) => Term::app(Term::app(*ffun, *farg), reduce(*arg)),
},
}
}
fn main() {
// (λx. x x) (λs. λz. s (s (s z)))
let term1 = Term::app(
Term::lam(|x| Term::app(x.clone(), x.clone())),
Term::lam(|s| Term::lam(move |z|
Term::app(
s.clone(),
Term::app(
s.clone(),
Term::app(
s.clone(),
z.clone()
))))));
// λb. λt. λf. b t f
let term2 = Term::lam(|b| Term::lam(move |t|
Term::lam({
let b = b.clone(); // necessary to satisfy the borrow checker
move |f| Term::app(Term::app(b.clone(), t.clone()), f)
})
));
println!("{}", pretty(reduce(term1))); // λx0. λx1. (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 x1)))))))))))))))))))))))))))
println!("{}", pretty(reduce(term2))); // λx0. λx1. λx2. ((x0 x1) x2)
}
Thanks to BurntSushi5 for the suggestion to use Rc
that I always forget exists and to Shepmaster for suggesting to remove the unnecessary Box
under Rc
in Lam
and how to satisfy the borrow checker in longer Lam
chains.
The accepted solution uses Rc
to create a clonable heap allocated closure.
Technically speaking, this is not necessary as there is no runtime reference counting needed. All we need is a closure as a trait object and that is also clonable.
However, Rust 1.29.2 does not allow us to have things like dyn Clone + FnOnce(Term) -> Term
, this restriction may be relaxed in the future. The restriction has two factors: Clone
is not object safe (which is unlikely to be relaxed) and if we combine two traits together, one of them has to be an auto trait (this can be relaxed IMHO).
Whilst waiting for the language improvement, we can introduce a new trait to get around this:
// Combination of FnOnce(Term) -> Term and Clone
trait TermLam {
// The FnOnce part, declared like an Fn, because we need object safety
fn app(&self, t: Term) -> Term;
// The Clone part, but we have to return sized objects
// (not Self either because of object safety), so it is in a box
fn clone_box(&self) -> Box<dyn TermLam>;
}
// Blanket implementation for appropriate types
impl<F> TermLam for F
where
F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term
{
// Note: when you have a Clone + FnOnce, you effectively have an Fn
fn app(&self, t: Term) -> Term {
(self.clone())(t)
}
fn clone_box(&self) -> Box<dyn TermLam> {
Box::new(self.clone())
}
}
// We can now clone the box
impl Clone for Box<dyn TermLam> {
fn clone(&self) -> Self {
self.clone_box()
}
}
Then we can remove the need to use Rc
.
#[derive(Clone)]
enum Term {
Hol(Box<Term>),
Var(usize),
Lam(Box<dyn TermLam>),
App(Box<Term>, Box<Term>),
}
impl Term {
fn app(t1: Term, t2: Term) -> Self {
App(Box::new(t1), Box::new(t2))
}
fn lam<F>(f: F) -> Self
where
F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term
{
Lam(Box::new(f))
}
fn hol(t: Term) -> Self {
Hol(Box::new(t))
}
}
fn pretty(term: Term) -> String {
fn go(lvl: usize, term: Term) -> String {
match term {
Hol(hol) => go(lvl, *hol),
Var(idx) => format!("x{}", idx),
Lam(bod) => format!("λx{}. {}", lvl, go(lvl + 1, bod.app(Term::hol(Var(lvl))))),
App(fun, arg) => format!("({} {})", go(lvl, *fun), go(lvl, *arg)),
}
}
go(0, term)
}
fn reduce(term: Term) -> Term {
match term {
Hol(hol) => *hol,
Var(idx) => Var(idx),
Lam(bod) => Term::lam(move |v| reduce(bod.app(v))),
App(fun, arg) => match reduce(*fun) {
Hol(fhol) => Term::app(Hol(fhol), reduce(*arg)),
Var(fidx) => Term::app(Var(fidx), reduce(*arg)),
Lam(fbod) => fbod.app(reduce(*arg)),
App(ffun, farg) => Term::app(Term::app(*ffun, *farg), reduce(*arg)),
},
}
}
fn main() {
// (λx. x x) (λs. λz. s (s (s z)))
let term1 = Term::app(
Term::lam(|x| Term::app(x.clone(), x.clone())),
Term::lam(|s| {
Term::lam(move |z| {
Term::app(
s.clone(),
Term::app(s.clone(), Term::app(s.clone(), z.clone())),
)
})
}),
);
// λb. λt. λf. b t f
let term2 = Term::lam(|b| {
Term::lam(move |t| {
Term::lam({
//let b = b.clone(); No longer necessary for Rust 1.29.2
move |f| Term::app(Term::app(b.clone(), t.clone()), f)
})
})
});
println!("{}", pretty(reduce(term1))); // λx0. λx1. (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 x1)))))))))))))))))))))))))))
println!("{}", pretty(reduce(term2))); // λx0. λx1. λx2. ((x0 x1) x2)
}
This was the original way that the other answer attempted, which the author was unable to resolve.
Rust is known to be achieve performance without sacrificing safety. The implementation above however, always passes Term
s by value and have many unnecessary clone
calls, so some optimizations can be done.
Also, the standard way to stringify a piece of Rust data is to use the Display
trait. So let's make those right!
use std::fmt::{Display, Error, Formatter};
use Term::*;
// Combination of FnOnce(Term) -> Term and Clone
trait TermLam {
// The FnOnce part, declared like an Fn, because we need object safety
fn app(&self, t: Term) -> Term;
// The Clone part, but we have to return sized objects
// (not Self either because of object safety), so it is in a box
fn clone_box(&self) -> Box<dyn TermLam>;
}
// Blanket implementation for appropriate types
impl<F> TermLam for F
where
F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term,
{
// Note: when you have a Clone + FnOnce, you effectively have an Fn
fn app(&self, t: Term) -> Term {
(self.clone())(t)
}
fn clone_box(&self) -> Box<dyn TermLam> {
Box::new(self.clone())
}
}
// We can now clone the box
impl Clone for Box<dyn TermLam> {
fn clone(&self) -> Self {
self.clone_box()
}
}
#[derive(Clone)]
enum Term {
Hol(Box<Term>),
Var(usize),
Lam(Box<dyn TermLam>),
App(Box<Term>, Box<Term>),
}
impl Term {
fn app(t1: Term, t2: Term) -> Self {
App(Box::new(t1), Box::new(t2))
}
fn lam<F>(f: F) -> Self
where
F: 'static/*' highlighting fix */ + Clone + FnOnce(Term) -> Term,
{
Lam(Box::new(f))
}
fn hol(t: Term) -> Self {
Hol(Box::new(t))
}
// `reduce` is now a by-reference method
fn reduce(&self) -> Term {
match self {
Hol(_) => self.clone(),
Var(_) => self.clone(),
Lam(bod) => {
let bod = bod.clone();
Term::lam(move |v| bod.app(v).reduce())
},
// We reuse the reduced object when possible,
// to avoid unnecessary clone.
App(fun, arg) => match fun.reduce() {
other @ Hol(_) => Term::app(other, arg.reduce()),
other @ Var(_) => Term::app(other, arg.reduce()),
Lam(fbod) => fbod.app(arg.reduce()),
other @ App(_, _) => Term::app(other, arg.reduce()),
},
}
}
}
//The standard way of `pretty` is `Display`
impl Display for Term {
fn fmt(&self, fmt: &mut Formatter) -> Result<(), Error> {
// As the API is different from `pretty`, the way we do recursion is
// a bit different as well
struct LvlTerm<'a>(usize, &'a Term);
impl<'a> Display for LvlTerm<'a> {
fn fmt(&self, fmt: &mut Formatter) -> Result<(), Error> {
match self {
LvlTerm(lvl, Hol(hol)) => write!(fmt, "{}", LvlTerm(*lvl, hol)),
LvlTerm(_, Var(idx)) => write!(fmt, "x{}", idx),
LvlTerm(lvl, Lam(bod)) => write!(
fmt,
"λx{}. {}",
*lvl,
LvlTerm(*lvl + 1, &bod.app(Term::hol(Var(*lvl))))
),
LvlTerm(lvl, App(fun, arg)) => {
write!(fmt, "({} {})", LvlTerm(*lvl, fun), LvlTerm(*lvl, arg))
}
}
}
}
write!(fmt, "{}", LvlTerm(0, self))
}
}
fn main() {
// In general, if you need to use a value n+1 times, you need to
// call clone it n times. You don't have to clone it in the last use.
// (λx. x x) (λs. λz. s (s (s z)))
let term1 = Term::app(
Term::lam(|x| Term::app(x.clone(), x)),
Term::lam(|s| {
Term::lam(move |z| Term::app(s.clone(), Term::app(s.clone(), Term::app(s, z))))
}),
);
// No clone is required if all values are used exactly once.
// λb. λt. λf. b t f
let term2 =
Term::lam(|b| Term::lam(move |t| Term::lam(move |f| Term::app(Term::app(b, t), f))));
println!("{}", term1.reduce()); // λx0. λx1. (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 (x0 x1)))))))))))))))))))))))))))
println!("{}", term2.reduce()); // λx0. λx1. λx2. ((x0 x1) x2)
}
Playground
We can see that, the code above can be further simplified: as the match arms in reduce
have code duplication, we can collapse them together. However, as the purpose of this answer is to demonstrate doing things the SAME way the Haskell code in the question does, so this was left as it is.
Also, by only requiring the closures to be FnOnce
, not Fn
, we relaxed in the usage site the requirement of cloning all variables to only when it is use more than once. The trade off is that whenever the closure was called, all variables it captures will be cloned. It is hard to tell which one is better until profiling on it; so I just pick the one that makes the code looks better.
Again, Rust makes those decisions explicit and keeps the differences of choices in you mind, this is good!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With