Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between a cyclic list and an infinite list in haskell?

Tags:

Referencing @dfeuer's answer to this question: Least expensive way to construct cyclic list in Haskell, which says that using cyclic lists 'defeats' the garbage collector as it has to keep everything you've consumed from a cyclic list allocated till you drop the reference to any cons cells in the list.

Apparently in Haskell a cyclic list and an infinite list are two separate things. This blog (https://unspecified.wordpress.com/2010/03/30/a-doubly-linked-list-in-haskell/) says that if you implement cycle as follows:

cycle xs = xs ++ cycle xs

it is an infinite list, not a cyclic list. To make it cyclic you have to implement it like this (as is found in the Prelude source code):

cycle xs = xs' where xs' = xs ++ xs'

What exactly is the difference between these two implementations? And why is it that if you are holding onto one cons cell somewhere in a cyclic list, that the garbage collector has to keep everything before it allocated as well?

like image 989
Ramith Jayatilleka Avatar asked Aug 26 '14 05:08

Ramith Jayatilleka


People also ask

How do you make an infinite list in Haskell?

But in Haskell, you only need to allocate space for the items in the list that actually get evaluated. The two most basic functions for creating an infinite list are "repeat" and "cycle". The first makes an infinite number of the same element, while the second allows you to cycle through a specific series of elements.

Would it be possible to verify that the Haskell list of twin primes you constructed contains only finitely many items?

No, this is not possible. It would be impossible to write such a function, because you can have lists whose finiteness might be unknown: consider a recursive loop generating a list of all the twin primes it can find.


2 Answers

The difference is entirely in the memory representation. From the point of view of the semantics of the language, they're indistinguishable—you can't write a function that can tell them apart, so your two versions of cycle are considered two implementations of the same function (they're the exact same mapping of arguments to results). In fact, I don't know if the language definition guarantees that one of those is cyclical and the other infinite.

But anyway, let's bring out the ASCII art. Cyclical list:

   +----+----+                 +----+----+
   | x0 |   ----->   ...   --->| xn |    |
   +----+----+                 +----+-|--+
     ^                                |
     |                                |
     +--------------------------------+

Infinite list:

   +----+----+
   | x0 |   ----->  thunk that produces infinite list
   +----+----+

The thing with the cyclical list is that from every cons cell in the list there is a path to all of the others and itself. This means that from the point of view of the garbage collector, if one of the cons cells is reachable, then all are. In the plain infinite list, on the other hand, there aren't any cycles, so from a given cons cell only its successors are reachable.

Note that the infinite list representation is more powerful than the cyclical one, because the cyclical representation only works with lists that repeat after some number of elements. For example, the list of all prime numbers can be represented as an infinite list, but not as a cyclical one.

Note also that this distinction can be generalized into two distinct ways of implementing the fix function:

fix, fix' :: (a -> a) -> a
fix  f = let result = f result in result
fix' f = f (fix' f)

-- Circular version of cycle:
cycle  xs = fix (xs++)

-- Infinite list version of cycle:
cycle' xs = fix' (xs++)

The GHC libraries go for my fix definition. The way GHC compiles code means that the thunk created for result is used both as the result and the argument of the application of f. I.e., the thunk, when forced, will call the object code for f with the thunk itself as its argument, and replace the thunk's contents with the result.

like image 118
Luis Casillas Avatar answered Nov 18 '22 10:11

Luis Casillas


Cyclic lists and infinite lists are different operationally, but not semantically.

A cyclic list is literally a loop in memory - imagine a singly linked list with the pointers following a cycle - so takes up constant space. Because each cell in the list can be reached from any other cell, holding onto any one cell will cause the entire list to be held onto.

An infinite list will take up more and more space as you evaluate more of it. Earlier elements will be garbage collected if no longer needed, so programs that process it may still run in constant space, although the garbage collection overhead will be higher. If earlier elements in the list are needed, for example because you hold onto a reference to the head of the list, then the list will consume linear space as you evaluate it, and will eventually exhaust available memory.

The reason for this difference is that without optimisations, a typical Haskell implementation like GHC will allocate memory once for a value, like xs' in the second definition of cycle, but will repeatedly allocate memory for a function invocation, like cycle xs in the first definition.

In principle optimisations might turn one definition into the other, but because of the quite different performance characteristics, it's unlikely that this will happen in practice as compilers are generally quite conservative about making programs behave worse. In some cases the cyclic variant will be worse because of the garbage collection properties already mentioned.

like image 36
GS - Apologise to Monica Avatar answered Nov 18 '22 10:11

GS - Apologise to Monica