Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is memory-efficient non-destructive manipulation of collections achieved in functional programming?

Tags:

I'm trying to figure out how non-destructive manipulation of large collections is implemented in functional programming, ie. how it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory. (Even if the original collection would be garbage-collected, I'd expect the memory footprint and general performance of such a collection to be awful.)

This is how far I've got until now:

Using F#, I came up with a function insert that splits a list into two pieces and introduces a new element in-between, seemingly without cloning all unchanged elements:

// return a list without its first n elements:
// (helper function)
let rec skip list n =
    if n = 0 then
        list
    else
        match list with
        | []    -> []
        | x::xs -> skip xs (n-1)

// return only the first n elements of a list:
// (helper function)
let rec take list n =
    if n = 0 then
        []
    else
        match list with
        | []    -> []
        | x::xs -> x::(take xs (n-1))

// insert a value into a list at the specified zero-based position:
let insert list position value =
    (take list position) @ [value] @ (skip list position)

I then checked whether objects from an original list are "recycled" in new lists by using .NET's Object.ReferenceEquals:

open System

let (===) x y =
    Object.ReferenceEquals(x, y)

let x = Some(42)
let L = [Some(0); x; Some(43)]
let M = Some(1) |> insert L 1

The following three expressions all evaluate to true, indicating that the value referred to by x is re-used both in lists L and M, ie. that there is only 1 copy of this value in memory:

L.[1] === x
M.[2] === x
L.[1] === M.[2]

My question:

Do functional programming languages generally re-use values instead of cloning them to a new memory location, or was I just lucky with F#'s behaviour? Assuming the former, is this how reasonably memory-efficient editing of collections can be implemented in functional programming?

(Btw.: I know about Chris Okasaki's book Purely functional data structures, but haven't yet had the time to read it thoroughly.)

like image 619
stakx - no longer contributing Avatar asked Jan 03 '10 02:01

stakx - no longer contributing


People also ask

Is functional programming memory efficient?

Functional programming languages are typically less efficient in their use of CPU and memory than imperative languages such as C and Pascal. This is related to the fact that some mutable data structures like arrays have a very straightforward implementation using present hardware.

Which data structure is used with functional programming languages?

Typically tuples, lists, and partially-evaluated functions are very common data structures in functional programming languages.

What is functional programming used for?

Functional Programming is used in situations where we have to perform lots of different operations on the same set of data. Lisp is used for artificial intelligence applications like Machine learning, language processing, Modeling of speech and vision, etc.


2 Answers

I'm trying to figure out how non-destructive manipulation of large collections is implemented in functional programming, ie. how it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory.

This page has a few descriptions and implementations of data structures in F#. Most of them come from Okasaki's Purely Functional Data Structures, although the AVL tree is my own implementation since it wasn't present in the book.

Now, since you asked, about reusing unmodified nodes, let's take a simple binary tree:

type 'a tree =
    | Node of 'a tree * 'a * 'a tree
    | Nil

let rec insert v = function
    | Node(l, x, r) as node ->
        if v < x then Node(insert v l, x, r)    // reuses x and r
        elif v > x then Node(l, x, insert v r)  // reuses x and l
        else node
    | Nil -> Node(Nil, v, Nil)

Note that we re-use some of our nodes. Let's say we start with this tree:

When we insert an e into the the tree, we get a brand new tree, with some of the nodes pointing back at our original tree:

If we don't have a reference to the xs tree above, then .NET will garbage collect any nodes without live references, specifically thed, g and f nodes.

Notice that we've only modified nodes along the path of our inserted node. This is pretty typical in most immutable data structures, including lists. So, the number of nodes we create is exactly equal to the number of nodes we need to traverse in order to insert into our data structure.

Do functional programming languages generally re-use values instead of cloning them to a new memory location, or was I just lucky with F#'s behaviour? Assuming the former, is this how reasonably memory-efficient editing of collections can be implemented in functional programming?

Yes.

Lists, however, aren't a very good data structure, since most non-trivial operations on them require O(n) time.

Balanced binary trees support O(log n) inserts, meaning we create O(log n) copies on every insert. Since log2(10^15) is ~= 50, the overhead is very very tiny for these particular data structures. Even if you keep around every copy of every object after inserts/deletes, your memory usage will increase at a rate of O(n log n) -- very reasonable, in my opinion.

like image 110
Juliet Avatar answered Sep 22 '22 17:09

Juliet


How it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory.

This works because no matter what kind of collection, the pointers to the elements are stored separately from the elements themselves. (Exception: some compilers will optimize some of the time, but they know what they are doing.) So for example, you can have two lists that differ only in the first element and share tails:

let shared = ["two", "three", "four"]
let l      = "one" :: shared
let l'     = "1a"  :: shared

These two lists have the shared part in common and their first elements different. What's less obvious is that each list also begins with a unique pair, often called a "cons cell":

  • List l begins with a pair containing a pointer to "one" and a pointer to the shared tail.

  • List l' begins with a pair containing a pointer to "1a" and a pointer to the shared tail.

If we had only declared l and wanted to alter or remove the first element to get l', we'd do this:

let l' = match l with
         | _ :: rest -> "1a" :: rest
         | []        ->  raise (Failure "cannot alter 1st elem of empty list")

There is constant cost:

  • Split l into its head and tail by examining the cons cell.

  • Allocate a new cons cell pointing to "1a" and the tail.

The new cons cell becomes the value of list l'.

If you're making point-like changes in the middle of a big collection, typically you'll be using some sort of balanced tree which uses logarithmic time and space. Less frequently you may use a more sophisticated data structure:

  • Gerard Huet's zipper can be defined for just about any tree-like data structure and can be used to traverse and make pointlike modifications at constant cost. Zippers are easy to understand.

  • Paterson and Hinze's finger trees offer very sophisticated representations of sequences, which among other tricks enable you to change elements in the middle efficiently—but they are hard to understand.

like image 35
Norman Ramsey Avatar answered Sep 21 '22 17:09

Norman Ramsey