Implementing lazy functional languages

Question

When implementing a lazy functional language, it is necessary to store values as unevaluated thunks, to be evaluated only when needed.

One of the challenges of an efficient implementation, as discussed in e.g. The Spineless Tagless G-machine, is that this evaluation must be carried out only once for each thunk, and subsequent accesses must reuse the calculated value - failure to do this would lead to at least quadratic slowdown (perhaps exponential? I'm not sure off the top of my head.)

I'm looking for a simple example implementation whose operation is easily understood (as opposed to an industrial-strength implementation like GHC which is designed for performance over simplicity). I came across minihaskell at http://www.andrej.com/plzoo/ which contains the following code.

As it is dubbed "an efficient interpreter", I would assume it does indeed carry out each evaluation only once and save the calculated value for reuse, but I'm having difficulty seeing where and how; I can only see one assignment statement in the interpreter itself, and that doesn't look like it's overwriting part of a thunk record.

So my question is, is this interpreter indeed doing such caching, and if so where and how? (And if not, what's the simplest extant implementation that does do so?)

Code from http://www.andrej.com/plzoo/html/minihaskell.html

let rec interp env = function
  | Var x ->
     (try
     let r = List.assoc x env in
       match !r with
           VClosure (env', e) -> let v = interp env' e in r := v ; v
         | v -> v
       with
       Not_found -> runtime_error ("Unknown variable " ^ x))
   ... snipping the easy stuff ...
  | Fun _ as e -> VClosure (env, e)
  | Apply (e1, e2) ->
      (match interp env e1 with
       VClosure (env', Fun (x, _, e)) ->
         interp ((x, ref (VClosure (env, e2)))::env') e
     | _ -> runtime_error "Function expected in application")
  | Pair _ as e ->  VClosure (env, e)
  | Fst e ->
      (match interp env e with
       VClosure (env', Pair (e1, e2)) -> interp env' e1
     | _ -> runtime_error "Pair expected in fst")
  | Snd e ->
      (match interp env e with
       VClosure (env', Pair (e1, e2)) -> interp env' e2
     | _ -> runtime_error "Pair expected in snd")
  | Rec (x, _, e) -> 
      let rec env' = (x,ref (VClosure (env',e))) :: env in
    interp env' e
  | Nil ty -> VNil ty
  | Cons _ as e -> VClosure (env, e)
  | Match (e1, _, e2, x, y, e3) ->
      (match interp env e1 with
       VNil _ -> interp env e2
     | VClosure (env', Cons (d1, d2)) ->
         interp ((x,ref (VClosure(env',d1)))::(y,ref (VClosure(env',d2)))::env) e3
     | _ -> runtime_error "List expected in match")

Edward Z. Yang · Accepted Answer

The key are the records: notice !r, r := v. Whenever we lookup a variable from the environment, we actually get back a record, which we dereference to see if it's a thunk. If it is a thunk, we evaluate it and then save the result. We create thunks during application (notice the call to the ref constructor), recursive definitions and pattern matching, because those are constructs that bind variables.

acfoltzer · Answer

Here are two call-by-need interpreters; one in Haskell, and one in Scheme. The key to both is that you can suspend evaluation inside procedures of no arguments (thunks). Whether your host language is call-by-need (Haskell) or call-by-value (Scheme, ML), lambda forms are considered values, so nothing under the lambda will be evaluated until the thunk is applied.

So, when an interpreted function is applied to an argument, you just wrap the unevaluated syntactic representation of the argument in a new thunk. Then, when you come across a variable, you look it up in the environment and promptly evaluate the thunk, giving you the value of the argument

Simply getting to this point makes your interpreter lazy, since arguments are not actually evaluated until they're used; this is a call-by-name interpreter. As you point out, though, an efficient lazy language will evaluate these arguments only once; such a language is call-by-need. To get this efficiency, you update the environment to instead contain a thunk containing just the value of the argument, rather than the entire argument expression.

The first interpreter here is in Haskell, and is fairly similar to the ML code you pasted. Of course, the challenges in Haskell are to 1) not trivially implement laziness, thanks to Haskell's built-in laziness, and 2) wrangle the side-effects into the code. Haskell's IORefs are used to allow the environment to be updated.

module Interp where

import Data.IORef

data Expr = ExprBool Bool
          | ExprInt Integer
          | ExprVar String
          | ExprZeroP Expr
          | ExprSub1 Expr
          | ExprMult Expr Expr
          | ExprIf Expr Expr Expr
          | ExprLam String Expr
          | ExprApp Expr Expr
          deriving (Show)

data Val = ValBool Bool                   
         | ValInt Integer
         | ValClos ((() -> IO Val) -> IO Val)

instance Show Val where
  show (ValBool b) = show b
  show (ValInt n) = show n
  show (ValClos c) = "Closure"

data Envr = EnvrEmpty                   
          | EnvrExt String (IORef (() -> IO Val)) Envr

applyEnv :: Envr -> String -> IO (IORef (() -> IO Val))
applyEnv EnvrEmpty y = error $ "unbound variable " ++ y
applyEnv (EnvrExt x v env) y =
  if x == y 
  then return v
  else applyEnv env y

eval :: Expr -> Envr -> IO Val            
eval exp env = case exp of
  (ExprBool b) -> return $ ValBool b
  (ExprInt n) -> return $ ValInt n
  (ExprVar y) -> do
    thRef <- applyEnv env y
    th <- readIORef thRef
    v <- th ()
    writeIORef thRef (\() -> return v)
    return v
  (ExprZeroP e) -> do
    (ValInt n) <- eval e env
    return $ ValBool (n == 0)
  (ExprSub1 e) -> do
    (ValInt n) <- eval e env 
    return $ ValInt (n - 1)
  (ExprMult e1 e2) -> do
    (ValInt n1) <- eval e1 env
    (ValInt n2) <- eval e2 env
    return $ ValInt (n1 * n2)
  (ExprIf te ce ae) -> do
    (ValBool t) <- eval te env
    if t then eval ce env else eval ae env
  (ExprLam x body) ->
    return $ ValClos (\a -> do
                         a' <- newIORef a
                         eval body (EnvrExt x a' env))
  (ExprApp rator rand) -> do
    (ValClos c) <- eval rator env 
    c (\() -> eval rand env)

-- "poor man's Y" factorial definition      
fact = ExprApp f f
  where f = (ExprLam "f" (ExprLam "n" (ExprIf (ExprZeroP (ExprVar "n"))
                                       (ExprInt 1)
                                       (ExprMult (ExprVar "n")
                                        (ExprApp (ExprApp (ExprVar "f")
                                                  (ExprVar "f"))
                                         (ExprSub1 (ExprVar "n")))))))

-- test factorial 5 = 120            
testFact5 = eval (ExprApp fact (ExprInt 5)) EnvrEmpty            

-- Omega, the delightful infinite loop
omega = ExprApp (ExprLam "x" (ExprApp (ExprVar "x") (ExprVar "x")))
                (ExprLam "x" (ExprApp (ExprVar "x") (ExprVar "x")))

-- show that ((\y -> 5) omega) does not diverge, because the 
-- interpreter is lazy
testOmega = eval (ExprApp (ExprLam "y" (ExprInt 5)) omega) EnvrEmpty

The second interpreter is in Scheme, where the only real boilerplate is Oleg's pattern-matching macro. I find that it's much easier to see where the laziness is coming from in the Scheme version. The box functions allow the environment to be updated; Chez Scheme includes them, but I've included definitions that should work for others.

(define box
  (lambda (x)
    (cons x '())))

(define unbox
  (lambda (b)
    (car b)))

(define set-box!
  (lambda (b v)
    (set-car! b v)))

;; Oleg Kiselyov's linear pattern matcher
(define-syntax pmatch
  (syntax-rules (else guard)
    ((_ (rator rand ...) cs ...)
     (let ((v (rator rand ...)))
       (pmatch v cs ...)))
    ((_ v) (errorf 'pmatch "failed: ~s" v))
    ((_ v (else e0 e ...)) (begin e0 e ...))
    ((_ v (pat (guard g ...) e0 e ...) cs ...)
     (let ((fk (lambda () (pmatch v cs ...))))
       (ppat v pat (if (and g ...) (begin e0 e ...) (fk)) (fk))))
    ((_ v (pat e0 e ...) cs ...)
     (let ((fk (lambda () (pmatch v cs ...))))
       (ppat v pat (begin e0 e ...) (fk))))))

(define-syntax ppat
  (syntax-rules (uscore quote unquote)
    ((_ v uscore kt kf)
     ; _ can't be listed in literals list in R6RS Scheme
     (and (identifier? #'uscore) (free-identifier=? #'uscore #'_))
     kt)
    ((_ v () kt kf) (if (null? v) kt kf))
    ((_ v (quote lit) kt kf) (if (equal? v (quote lit)) kt kf))
    ((_ v (unquote var) kt kf) (let ((var v)) kt))
    ((_ v (x . y) kt kf)
     (if (pair? v)
       (let ((vx (car v)) (vy (cdr v)))
     (ppat vx x (ppat vy y kt kf) kf))
       kf))
    ((_ v lit kt kf) (if (equal? v (quote lit)) kt kf))))

(define empty-env
  (lambda ()
    `(empty-env)))

(define extend-env
  (lambda (x v env)
    `(extend-env ,x ,v ,env)))

(define apply-env
  (lambda (env y)
    (pmatch env
      [(extend-env ,x ,v ,env)
       (if (eq? x y)
           v
           (apply-env env y))])))

(define value-of
  (lambda (exp env)
    (pmatch exp
      [,b (guard (boolean? b)) b]
      [,n (guard (integer? n)) n]
      [,y (guard (symbol? y))
       (let* ([box (apply-env env y)]
              [th (unbox box)]
              [v (th)])
         (begin (set-box! box (lambda () v)) v))]
      [(zero? ,e) (zero? (value-of e env))]
      [(sub1 ,e) (sub1 (value-of e env))]
      [(* ,e1 ,e2) (* (value-of e1 env) (value-of e2 env))]
      [(if ,t ,c ,a) (if (value-of t env)
                         (value-of c env)
                         (value-of a env))]
      [(lambda (,x) ,body)
       (lambda (a) (value-of body (extend-env x a env)))]
      [(,rator ,rand) ((value-of rator env)
                       (box (lambda () (value-of rand env))))])))

;; "poor man's Y" factorial definition
(define fact
  (let ([f '(lambda (f)
              (lambda (n)
                (if (zero? n)
                    1
                    (* n ((f f) (sub1 n))))))])
    `(,f ,f)))

;; test factorial 5 = 120
(define testFact5
  (lambda ()
    (value-of `(,fact 5) (empty-env))))

;; Omega, the delightful infinite loop
(define omega
  '((lambda (x) (x x)) (lambda (x) (x x))))

;; show that ((lambda (y) 5) omega) does not diverge, because the interpreter
;; is lazy
(define testOmega
  (lambda ()
    (value-of `((lambda (y) 5) ,omega) (empty-env))))

augustss · Answer

You should have a look at graph reduction using combinators (SKI). It's beautiful and simple and illustrates how lazy evaluation works.

Cactus · Answer

You might be interested in Alef (Alef Lazily Evaluates Functions), which is a very simple, pure, lazy functional programming language that I originally created specifically for explaining lazy evaluation via graph reduction. It is implemented in less than 500 lines of Common Lisp, including some neat visualization functions. http://gergo.erdi.hu/blog/2013-02-17-write_yourself_a_haskell..._in_lisp/

Unfortunately, I haven't gotten around to finishing 'Typecheck Yourself a Haskell... in Lisp' yet, even though most of the code was already written around the time I posted part 1.

Implementing lazy functional languages

Tags:

haskell

rwallace

4 Answers

Edward Z. Yang

acfoltzer

augustss

Cactus

Recent Activity

Donate For Us

Implementing lazy functional languages

Tags:

haskell

rwallace

4 Answers

Edward Z. Yang

acfoltzer

augustss

Cactus

Related questions

Recent Activity

Donate For Us