Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data.Sequence vs. Data.DList for appending data to the end of the list

I'm writing some code that needs to frequently append to the end of a list. I know that using "++" is inefficient. So instead I build up the list backwards by appending to the head, and then reverse it when I'm done. I gather that this a common beginner tactic.

I would rather build it up in the correct order to begin with - but that means switching to a new data structure. I'm considering using Data.Sequence or Data.DList for my container. My list consists of strict int pairs, and I don't need random access to it. What are the relative merits of Data.Sequence and Data.DList, and are there other containers I should consider?

like image 592
nont Avatar asked Jun 13 '11 13:06

nont


1 Answers

Whether to use Data.Sequence or DList depends on how you are going to be using the resulting list. DList is great when you are building up a sequence, say in a Writer computation, to convert to a list at the end and use it. However, if you need to use the intermediate results, like, say:

f (foo ++ bar)
+ f (foo ++ bar ++ baz)
+ f (foo ++ bar ++ baz ++ quux)

then DList is pretty bad, because it needs to recompute the spine each time. Data.Sequence is a better choice in this situation. Data.Sequence is also better if you need to remove elements from the sequence.

But maybe you don't even need to make this decision. Reversing lists at the end of a computation is common in strict functional languages like ML and Scheme, but not in Haskell. Take, for example, these two ways of writing map:

map_i f xs = reverse $ go [] xs
    where
    go accum [] = accum
    go accum (x:xs) = go (f x : accum) xs

map_ii f [] = []
map_ii f (x:xs) = f x : map_ii f xs

In a strict language, map_ii would be horrible because it uses linear stack space, whereas map_i is tail recursive. But because Haskell is lazy, map_i is the inefficient one. map_ii can consume one element of the input and yield one element of the output, whereas map_i consumes the whole input before yielding any output.

Tail recursion isn't the holy grail of efficient implementation in Haskell. When producing a data structure like a list, you actually want to be co-recursive; that is, make the recursive call underneath an application of a constructor (eg. f x : map_ii f xs above).

So if you find yourself reversing after a tail-recursive funciton, see if you can factor the whole lot into a corecursive function.

like image 98
luqui Avatar answered Nov 04 '22 13:11

luqui