Consider this bit of Chez Scheme code:
(import (chezscheme)) (define (list-enumerate ls val proc) (let loop ((ls ls) (return? #f) (val val)) (if (or (null? ls) return?) val (call-with-values (lambda () (proc val (car ls))) (lambda (return? val) (loop (cdr ls) return? val)))))) (define (list-index ls proc) (list-enumerate ls 0 (lambda (i elt) (if (proc elt) (values #t i) (values #f (+ i 1)))))) (define n 100000) (define data (iota n)) (time (list-index data (lambda (elt) (= elt (- n 1)))))
Run it:
~ $ scheme --script ~/scratch/_list-enumerate-allocation-test-chez-a.sps (time (list-index data ...)) no collections 3 ms elapsed cpu time 4 ms elapsed real time 8 bytes allocated
Wow, it reports that only 8 bytes were allocated.
Let's run it again using the --program
option instead of --script
:
~ $ scheme --program ~/scratch/_list-enumerate-allocation-test-chez-a.sps (time (list-index data ...)) no collections 3 ms elapsed cpu time 3 ms elapsed real time 800000 bytes allocated
Yikes, 800000 bytes allocated.
What's up with the difference?
Ed
When used interactively, Chez Scheme prompts the user with a right angle bracket ">" at the beginning of each input line. Any Scheme expression may be entered. The system evaluates the expression and prints the result. After printing the result, the system prompts again for more input.
In my machine, Chez compiles a "hello world" program to a 837-byte . so file, which takes about 125ms to run – a small but noticeable startup time. A standalone binary compiled with chez-exe weighs in at 2.7MB and takes 83ms – just barely noticeable.
Here's a note from Kent Dybvig in response:
That's an interesting question.
When run with --script, which uses the REPL semantics, the variables defined in the script, like list-enumerate and list-index, are mutable, which inhibits interprocedural optimizations, including inlining. When run with --program, however, the variables are immutable, which allows interprocedural optimizations.
In this case, --program allows the compiler to inline list-enumerate into list-index's body and in turn the lambda expression within list-index's body into list-enumerate's body. The end result is a conditional expression within the call-with-values producer expression. This causes the compiler to create a closure for the consumer, to avoid code duplication along the then and else branches of the conditional. This closure is created each time through list-enumerate's loop, resulting in the extra allocation overhead. That's the way optimizations often go. Mostly you win, but sometimes you lose. The good news is, on balance, the benefits outweight he costs, even in your program. I put the call to list-index in a loop (the modified code is below) and found that that with --program, the code runs about 30% faster.
Kent
(import (chezscheme)) (define (list-enumerate ls val proc) (let loop ((ls ls) (return? #f) (val val)) (if (or (null? ls) return?) val (call-with-values (lambda () (proc val (car ls))) (lambda (return? val) (loop (cdr ls) return? val)))))) (define (list-index ls proc) (list-enumerate ls 0 (lambda (i elt) (if (proc elt) (values #t i) (values #f (+ i 1)))))) (define n 100000) (define data (time (iota n))) (let () (define runalot (lambda (i thunk) (let loop ([i i]) (let ([x (thunk)]) (if (fx= i 1) x (loop (fx- i 1))))))) (time (runalot 1000 (lambda () (list-index data (lambda (elt) (= elt (- n 1))))))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With