Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell - why would I use infinite data structures?

In Haskell, it is possible to define infinite lists like so:

[1.. ]

If found many articles which describe how to implement infinite lists and I understood how this works. However, I cant think of any reason to use the concept of infinite datastructures.

Can someone give me an example of a problem, which can be solved easier (or maybe only) with an infinite list in Haskell?

like image 495
Moritz Schmidt Avatar asked Jun 12 '20 17:06

Moritz Schmidt


People also ask

Can Haskell have infinite list?

As you all know, lists in Haskell can be infinite. This capability is a natural consequence of Haskell's lazy evaluation. An infinite list in Haskell is no problem since Haskell will lazily generate only as much of the list as is needed, when it is needed.

How do you make an infinite list in Haskell?

But in Haskell, you only need to allocate space for the items in the list that actually get evaluated. The two most basic functions for creating an infinite list are "repeat" and "cycle". The first makes an infinite number of the same element, while the second allows you to cycle through a specific series of elements.

Can lists be infinite?

Infinite lists, whose elements are evaluated upon demand, can be implemented using functions as data. The tail of a lazy list is a function that, if called, pro- duces another lazy list. A lazy list can be infinitely long and any finite number of its elements can be evaluated.


2 Answers

The basic advantage of lists in Haskell is that they’re a control structure that looks like a data structure. You can write code that operates incrementally on streams of data, but it looks like simple operations on lists. This is in contrast to other languages that require the use of an explicitly incremental structure, like iterators (Python’s itertools), coroutines (C# IEnumerable), or ranges (D).

For example, a sort function can be written in such a way that it sorts as few elements as possible before it starts to produce results. While sorting the entire list takes O(n log n) / linearithmic time in the length of the list, minimum xs = head (sort xs) only takes O(n) / linear time, because head will only examine the first constructor of the list, like x : _, and leave the tail as an unevaluated thunk that represents the remainder of the sorting operation.

This means that performance is compositional: for example, if you have a long chain of operations on a stream of data, like sum . map (* 2) . filter (< 5), it looks like it would first filter all the elements, then map a function over them, then take the sum, producing a full intermediate list at every step. But what happens is that each element is only processed one at a time: given [1, 2, 6], this basically proceeds as follows, with all the steps happening incrementally:

  • total = 0
  • 1 < 5 is true
  • 1 * 2 == 2
  • total = 0 + 2 = 2
  • 2 < 5 is true
  • 2 * 2 == 4
  • total = 2 + 4 = 6
  • 6 < 5 is false
  • result = 6

This is exactly how you would write a fast loop in an imperative language (pseudocode):

total = 0;
for x in xs {
  if (x < 5) {
    total = total + x * 2;
  }
}

This means that performance is compositional: because of laziness, this code has constant memory usage during the processing of the list. And there is nothing special inside map or filter that makes this happen: they can be entirely independent.

For another example, and in the standard library computes the logical AND of a list, e.g. and [a, b, c] == a && b && c, and it’s implemented simply as a fold: and = foldr (&&) True. The moment it reaches a False element in the input, it stops evaluation, simply because && is lazy in its right argument. Laziness gives you composition!

For a great paper on all this, read the famous Why Functional Programming Matters by John Hughes, which goes over the advantages of lazy functional programming (in Miranda, a forebear of Haskell) far better than I could.

like image 190
Jon Purdy Avatar answered Sep 28 '22 08:09

Jon Purdy


  • Annotating a list with its indices temporarily uses an infinite list of indices:

    zip [0..] ['a','b','c','d'] = [(0,'a'), (1,'b'), (2,'c'), (3,'d')]
    
  • Memoizing functions while maintaining purity (in this case this transformation causes an exponential speed increase, because the memo table is used recursively):

    fib = (memo !!)
        where
        memo = map fib' [0..]  -- cache of *all* fibonacci numbers (evaluated on demand) 
        fib' 0 = 0
        fib' 1 = 1
        fib' n = fib (n-1) + fib (n-2)
    
  • Purely mocking programs with side-effects (free monads)

    data IO a = Return a
              | GetChar (Char -> IO a)
              | PutChar Char (IO a)
    

    Potentially non-terminating programs are represented with infinite IO strucutres; e.g. forever (putChar 'y') = PutChar 'y' (PutChar 'y' (PutChar 'y' ...))

  • Tries: if you define a type approximately like the following:

    data Trie a = Trie a (Trie a) (Trie a)
    

    it can represent an infinite collection of as indexed by the naturals. Note that there is no base case for the recursion, so every Trie is infinite. But the element at index n can be accessed in log(n) time. Which means you can do something like this (using some functions in the inttrie library):

    findIndices :: [Integer] -> Trie [Integer]
    findIndices = foldr (\(i,x) -> modify x (i:)) (pure []) . zip [0..]
    

    this builds an efficient "reverse lookup table" which given any value in the list can tell you at which indices it occurs, and it both caches results and streams information as soon as it's available:

    -- N.B. findIndices [0, 0,1, 0,1,2, 0,1,2,3, 0,1,2,3,4...]
    > table = findIndices (concat [ [0..n] | n <- [0..] ])
    > table `apply` 0
    [0,1,3,6,10,15,21,28,36,45,55,66,78,91,...
    

    all from a one-line infinite fold.

I'm sure there are more examples, there are so many cool things you can do.

like image 43
luqui Avatar answered Sep 28 '22 09:09

luqui