Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How fast is Data.Sequence.Seq compared to []?

Clearly Seq asymptotically performs the same or better as [] for all possible operations. But since its structure is more complicated than lists, for small sizes its constant overhead will probably make it slower. I'd like to know how much, in particular:

  1. How much slower is <| compared to :?
  2. How much slower is folding over/traversing Seq compared to folding over/traversing [] (excluding the cost of a folding/traversing function)?
  3. What is the size (approximately) for which \xs x -> xs ++ [x] becomes slower than |>?
  4. What is the size (approximately) for which ++ becomes slower than ><?
  5. What's the cost of calling viewl and pattern matching on the result compared to pattern matching on a list?
  6. How much memory does an n-element Seq occupy compared to an n-element list? (Not counting the memory occupied by the elements, only the structure.)

I know that it's difficult to measure, since with Seq we talk about amortized complexity, but I'd like to know at least some rough numbers.

like image 662
Petr Avatar asked Feb 10 '13 14:02

Petr


2 Answers

This should be a start - http://www.haskell.org/haskellwiki/Performance#Data.Sequence_vs._lists

A sequence uses between 5/6 and 4/3 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC). If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary. Heavy use of split and append will result in sequences using approximately the same space as lists. In detail:

  • a list of length n consists of n cons nodes, each occupying 3 words.
  • a sequence of length n has approximately n/(k-1) nodes, where k is the average arity of the internal nodes (each 2 or 3). There is a pointer, a size and overhead for each node, plus a pointer for each element, i.e. n(3/(k-1) + 1) words.

List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.

like image 87
manojlds Avatar answered Oct 23 '22 06:10

manojlds


I have one more concrete result to add to above answer. I am solving a Langevin equation. I used List and Data.Sequence. A lot of insertions at back of list/sequence are going on in this solution.

To sum up, I did not see any improvement in speed, in fact performance deteriorated with Sequences. Moreover with Data.Sequence, I need to increase the memory available for Haskell RTS.

Since I am definitely not an authority on optimizing; I post the both cases below. I'd be glad to know if this can be improved. Both codes were compiled with -O2 flag.

  1. Solution with List, takes approx 13.01 sec
  2. Solution with Data.Sequence, takes approx 15.13 sec
like image 1
Dilawar Avatar answered Oct 23 '22 08:10

Dilawar