Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How lazy is Haskell's `++`?

I'm curious how I should go about improving the performance of a Haskell routine that finds the lexicographically minimal cyclic rotation of a string.

import Data.List
swapAt n = f . splitAt n where f (a,b) = b++a
minimumrotation x = minimum $ map (\i -> swapAt i x) $ elemIndices (minimum x) x

I'd imagine that I should use Data.Vector rather than lists because Data.Vector provides in-place operations, probably just manipulating some indices into the original data. I shouldn't actually need to bother tracking the indices myself to avoid excess copying, right?

I'm curious how the ++ impact the optimization though. I'd imagine it produces a lazy string thunk that never does the appending until the string gets read that far. Ergo, the a should never actually be appended onto the b whenever minimum can eliminate that string early, like because it begins with some very later letter. Is this correct?

like image 337
Jeff Burdges Avatar asked Jan 15 '12 19:01

Jeff Burdges


People also ask

Why is Haskell lazy?

Haskell is a lazy language. It does not evaluate expressions until it absolutely must. This frequently allows our programs to save time by avoiding unnecessary computation, but they are at more of a risk to leak memory. There are ways of introducing strictness into our programs when we don't want lazy evaluation.

Is Haskell lazy or eager?

Haskell is a lazy language, meaning that it employs lazy evaluation . Before explaining lazy evaluation , let's first explain strict evaluation which most readers will likely be more familiar with. Strict evaluation means that arguments to functions are evaluated prior to being passed to the functions.

Is Haskell lazy by default?

Haskell uses lazy evaluation by default, although you can modify this to make functions strict.

Why lazy evaluation is important?

The benefits of lazy evaluation include: The ability to define control flow (structures) as abstractions instead of primitives. The ability to define potentially infinite data structures. This allows for more straightforward implementation of some algorithms.


2 Answers

xs ++ ys adds some overhead in all the list cells from xs, but once it reaches the end of xs it's free — it just returns ys.

Looking at the definition of (++) helps to see why:

[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

i.e., it has to "re-build" the entire first list as the result is traversed. This article is very helpful for understanding how to reason about lazy code in this way.

The key thing to realise is that appending isn't done all at once; a new linked list is incrementally built by first walking through all of xs, and then putting ys where the [] would go.

So, you don't have to worry about reaching the end of b and suddenly incurring the one-time cost of "appending" a to it; the cost is spread out over all the elements of b.

Vectors are a different matter entirely; they're strict in their structure, so even examining just the first element of xs V.++ ys incurs the entire overhead of allocating a new vector and copying xs and ys to it — just like in a strict language. The same applies to mutable vectors (except that the cost is incurred when you perform the operation, rather than when you force the resulting vector), although I think you'd have to write your own append operation with those anyway. You could represent a bunch of appended (immutable) vectors as [Vector a] or similar if this is a problem for you, but that just moves the overhead to when you flattening it back into a single Vector, and it sounds like you're more interested in mutable vectors.

like image 190
ehird Avatar answered Sep 28 '22 07:09

ehird


Try

minimumrotation :: Ord a => [a] -> [a]
minimumrotation xs = minimum . take len . map (take len) $ tails (cycle xs)
  where
    len = length xs

I expect that to be faster than what you have, though index-juggling on an unboxed Vector or UArray would probably be still faster. But, is it really a bottleneck?

like image 33
Daniel Fischer Avatar answered Sep 28 '22 07:09

Daniel Fischer