Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is it okay to modify a variable in functional languages?

So I'm teaching myself functional programming using Racket Scheme, and I love it so far. As an exercise for myself, I've been trying to implement a few simple tasks in a purely functional way. I know immutability is an important part of functional style, but I was wondering if there are any times when it's okay.

I thought of a fun way for a function to strip non-unique strings from a list of strings when used with filter, shown here:

(define (make-uniquer)
  (let ([uniques '()])
    (lambda (x)
      (if (not (member x uniques))
          (set! uniques (cons x uniques))
          #f))))

(define (uniquify x)
  (let ([uniquer (make-uniquer)])
    (filter uniquer x)))

As you can see, make-uniquer returns a closure over a list of strings to compare against for uniqueness, this way it can act as a simple predicate for filter. But I'm destructively updating the closed-over list. Is this bad form, or is it okay to change local closed-over variables in such ways?

like image 714
Dutch Avatar asked Aug 21 '12 00:08

Dutch


2 Answers

Pure and impure functional programming

Pure functions are inherently referentially transparent, which allows memoization (caching the result). Lack of mutable state permits reentrancy, allows different versions of linked data structures to share memory, and makes automatic parallelization possible. The point is that by restricting yourself from mutating state, you no longer have to think about many complicated issues of imperative programming.

However, this restriction has drawbacks. One is performance: some algorithms and data structures (like building a hash table) simply can't be expressed as pure functions without having to copy around large amounts of data. Another: Compare with Haskell, a pure functional language. Since mutation does not exist (conceptually), you have to represent state changes using monads. (Though Haskell provides a reasonably concise do-notation syntactic sugar, programming within a state monad is quite a different language from "regular" Haskell!) If your algorithm is most easily expressed using several interlocking loops that change state, a Haskell implementation will be clunkier than what's possible in an impure language.

An example is changing a single node deeply nested within an XML document. It's possible but more difficult without state mutation, using zipper data structures. Example pseudocode (pure):

old_xml = <a><b><c><d><e><f><g><h attrib="oldvalue"/></g></f></e></d></c></b></a>
// '\' is the XML selection operator
node_to_change = orig_xml \ "a" \ "b" \ "c" \ "d" \ "e" \ "f" \ "g" \ "h"
node_changed = node_to_change.copy("attrib" -> "newvalue")
new_xml = node_changed.unselect().unselect().unselect().unselect()
                      .unselect().unselect().unselect().unselect()
return new_xml

Example (impure):

xml = <a><b><c><d><e><f><g><h attrib="oldvalue"/></g></f></e></d></c></b></a>
node_to_change = orig_xml.select_by_xpath("/a/b/c/d/e/f/g/h")
node_to_change.set("attrib" -> "newvalue")
return xml    // xml has already been updated

For more information on purely functional data structures, see https://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki. (Further, it's often possible to write a procedural function that only manipulates internal state, so that it can be wrapped up so that it is effectively a pure function to its callers. This is a bit easier in an impure language because you don't have to write it in a state monad and pass it to runST.)

Though writing in an impure style you lose these benefits, some of the other conveniences of functional programming (like higher-order functions) still apply.

Using mutation

Lisp is an impure functional language, meaning that it permits state mutation. This is by design, so that if you need mutation you can use it without practically having to use a different language.

Generally, yes, it's fine to use state mutation when

  • it's needed for performance reasons, or
  • your algorithm can be expressed more clearly using mutation.

When you do so:

  • Clearly document that your uniquify function will mutate the list you pass to it. It would be nasty for the caller to pass your function a variable and have it come back changed!
  • If your application is multithreaded, analyze, be aware of, and document whether your impure function is thread-safe.
like image 125
Mechanical snail Avatar answered Sep 19 '22 12:09

Mechanical snail


In this case, I'd avoid a mutable implementation because a functional implementation can compete pretty well in terms of performance. Here are three versions (including the built-in remove-duplicates) of the function:

#lang racket

(define (make-uniquer)
  (let ([uniques (make-hash)])
    (lambda (x)
      (if (not (hash-ref uniques x #f))
          (hash-set! uniques x #t)
          #f))))

(define (uniquify x)
  (let ([uniquer (make-uniquer)])
    (filter uniquer x)))

(define (uniquify-2 lst)
  (define-values (_ r)
   (for/fold ([found (hash)] [result '()])
             ([elem (in-list lst)])
     (cond [(hash-ref found elem #f)
            (values found result)]
           [else (values (hash-set found elem #t)
                         (cons elem result))])))
  (reverse r))

(define randoms (build-list 100000 (λ (n) (random 10))))
(time (for ([i 100]) (uniquify randoms)))
(time (for ([i 100]) (uniquify-2 randoms)))
(time (for ([i 100]) (remove-duplicates randoms)))

;; sanity check
(require rackunit)
(check-equal? (uniquify randoms) (uniquify-2 randoms))
(check-equal? (uniquify randoms) (remove-duplicates randoms))

The times I get for this are

cpu time: 1348 real time: 1351 gc time: 0
cpu time: 1016 real time: 1019 gc time: 32
cpu time: 760 real time: 760 gc time: 0

Not scientific numbers, so take this with a grain of salt. To be fair, I did tune uniquify-2 a bit since my first version was slower. I also improved the mutable version with a hash table, but maybe there are other optimizations that could be applied. Also, the built-in remove-duplicates is tuned for performance and does use a mutable data structure (though it avoids set!).

You may also be interested in the Guide entry on performance. It points out that use of set! can harm performance, so use it with care.

like image 32
Asumu Takikawa Avatar answered Sep 21 '22 12:09

Asumu Takikawa