Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What container really mimics std::vector in Haskell?

Tags:

c++

haskell

The problem

I'm looking for a container that is used to save partial results of n - 1 problems in order to calculate the nth one. This means that the size of the container, at the end, will always be n.

Each element, i, of the container depends on at least 2 and up to 4 previous results.

The container have to provide:

  • constant time insertions at either beginning or end (one of the two, not necessarily both)
  • constant time indexing in the middle

or alternatively (given a O(n) initialization):

  • constant time single element edits
  • constant time indexing in the middle

What is std::vector and why is it relevant

For those of you who don't know C++, std::vector is a dynamically sized array. It is a perfect fit for this problem because it is able to:

  • reserve space at construction
  • offer constant time indexing in the middle
  • offer constant time insertion at the end (with a reserved space)

Therefore this problem is solvable in O(n) complexity, in C++.

Why Data.Vector is not std::vector

Data.Vector, together with Data.Array, provide similar functionality to std::vector, but not quite the same. Both, of course, offer constant time indexing in the middle, but they offer neither constant time modification ((//) for example is at least O(n)) nor constant time insertion at either beginning of end.

Conclusion

What container really mimics std::vector in Haskell? Alternatively, what is my best shot?

like image 276
Shoe Avatar asked Jun 09 '14 15:06

Shoe


3 Answers

From reddit comes the suggestion to use Data.Vector.constructN:

O(n) Construct a vector with n elements by repeatedly applying the generator function to the already constructed part of the vector.

 constructN 3 f = let a = f <> ; b = f <a> ; c = f <a,b> in f <a,b,c>

For example:

λ import qualified Data.Vector as V
λ V.constructN 10 V.length
fromList [0,1,2,3,4,5,6,7,8,9]
λ V.constructN 10 $ (1+) . V.sum
fromList [1,2,4,8,16,32,64,128,256,512]
λ V.constructN 10 $ \v -> let n = V.length v in if n <= 1 then 1 else (v V.! (n - 1)) + (v V.! (n - 2))
fromList [1,1,2,3,5,8,13,21,34,55]

This certainly seems to qualify to solve the problem as you've described it above.

like image 137
rampion Avatar answered Nov 16 '22 12:11

rampion


The first data structures that come to my mind are either Maps from Data.Map or Sequences from Data.Sequence.

Update

Data.Sequence

Sequences are persistent data structures that allow most operations efficient, while allowing only finite sequences. Their implementation is based on finger-trees, if you are interested. But which qualities does it have?

  • O(1) calculation of the length
  • O(1) insert at front/back with the operators <| and |> respectively.
  • O(n) creation from a list with fromlist
  • O(log(min(n1,n2))) concatenation for sequences of length n1 and n2.
  • O(log(min(i,n-i))) indexing for an element at position i in a sequence of length n.

Furthermore this structure supports a lot of the known and handy functions you'd expect from a list-like structure: replicate, zip, null, scans, sort, take, drop, splitAt and many more. Due to these similarities you have to do either qualified import or hide the functions in Prelude, that have the same name.

Data.Map

Maps are the standard workhorse for realizing a correspondence between "things", what you might call a Hashmap or associave array in other programming languages are called Maps in Haskell; other than in say Python Maps are pure - so an update gives you back a new Map and does not modify the original instance.

Maps come in two flavors - strict and lazy.

Quoting from the Documentation

Strict

API of this module is strict in both the keys and the values.

Lazy

API of this module is strict in the keys, but lazy in the values.

So you need to choose what fits best for your application. You can try both versions and benchmark with criterion.

Instead of listing the features of Data.Map I want to pass on to

Data.IntMap.Strict

Which can leverage the fact that the keys are integers to squeeze out a better performance Quoting from the documentation we first note:

Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64).

So what are the characteristics for IntMaps

  • O(min(n,W)) for (unsafe) indexing (!), unsafe in the sense that you will get an error if the key/index does not exist. This is the same behavior as Data.Sequence.
  • O(n) calculation of size
  • O(min(n,W)) for safe indexing lookup, which returns a Nothing if the key is not found and Just a otherwise.
  • O(min(n,W)) for insert, delete, adjust and update

So you see that this structure is less efficient than Sequences, but provide a bit more safety and a big benefit if you actually don't need all entries, such the representation of a sparse graph, where the nodes are integers.

For completeness I'd like to mention a package called persistent-vector, which implements clojure-style vectors, but seems to be abandoned as the last upload is from (2012).

Conclusion

So for your use case I'd strongly recommend Data.Sequence or Data.Vector, unfortunately I don't have any experience with the latter, so you need to try it for yourself. From the stuff I know it provides a powerful thing called stream fusion, that optimizes to execute multiple functions in one tight "loop" instead of running a loop for each function. A tutorial for Vector can be found here.

like image 44
epsilonhalbe Avatar answered Nov 16 '22 14:11

epsilonhalbe


When looking for functional containers with particular asymptotic run times, I always pull out Edison.

Note that there's a result that in a strict language with immutable data structures, there's always a logarithmic slowdown to implementing mutable data structure on top of them. It's an open problem whether the limited mutation hidden behind laziness can avoid that slowdown. There also the issue of persistent vs. transient...

Okasaki is still a good read for background, but finger trees or something more complex like an RRB-tree should be available "off-the-shelf" and solve your problem.

like image 3
Boyd Stephen Smith Jr. Avatar answered Nov 16 '22 12:11

Boyd Stephen Smith Jr.