Haskell vector C++ push_back analogue

1 Answers

No there really is no such facility in Data.Vector. It isn't too difficult to implement this from scratch using MutableArray like Data.Vector.Mutable does (see my implementation below), but there are some significant drawbacks. In particular, all of its operations end up happening inside some state context usually ST or IO. This has the downsides that

Any code that manipulates such a data structure ends up having to be monadic
The compiler is much less likely to be able to optimize. For example, libraries like vector use something really clever called fusion to optimize away intermediate allocations. This sort of thing is not possible in a state context.
Parallelism is going to be a lot tougher: in ST I can't even have two threads and in IO I will have race conditions all over the place. The nasty bit here is that any sharing is going to have to happen in IO.

As if all this wasn't enough, garbage collection also performs better inside pure code.

What do I do then?

It isn't particularly often that you have a need for exactly this behaviour - usually you are better off using an immutable data structure (thereby avoiding all of the aforementioned problems) which does something similar. Just limiting ourselves to containers which comes with GHC, some alternatives include:

if you are almost always just using push_back, maybe you just want a stack (a plain old [a]).
if you anticipate doing more push_back than lookups, Data.Sequence gives you O(1) appending to either end and O(log n) lookup.
if you are interested in a lot of operations especially hashmap-like, Data.IntMap is pretty optimized. Even if the theoretical cost of those operations is O(log n), you will need a pretty big IntMap to start feeling those costs.

Making something like C++ `vector`

Of course, if one doesn't care about the restrictions mentioned initially, there is no reason not to have a C++ like vector. Just for fun, I went ahead and implemented this from scratch (needs packages data-default and primitive).

The reason this code is probably not already in some library is that it goes against much of the spirit of Haskell (I do this with the intent of conforming to a C++ style vector).

The only operation that actually makes a new vector is newVector - everything else "modifies" an existing vector. Since pushBack doesn't return a new GrowVector, it has to modify the existing one (including its length and/or capacity), so length and capacity have to be "pointers". In turn, that means that even getting the length is a monadic operation.
While this isn't unboxed, it would not be too difficult to replicate vectors data family approach - it is just tedious¹.

With that said:

module GrowVector (
  GrowVector, newEmpty, size, read, write, pushBack, popBack
) where 

import Data.Primitive.Array
import Data.Primitive.MutVar
import Data.Default
import Control.Monad
import Control.Monad.Primitive (PrimState, PrimMonad)
import Prelude hiding (length, read)

data GrowVector s a = GrowVector
  { underlying :: MutVar s (MutableArray s a) -- ^ underlying array
  , length :: MutVar s Int                    -- ^ perceived length of vector
  , capacity :: MutVar s Int                  -- ^ actual capacity
  }

type GrowVectorIO = GrowVector (PrimState IO)

-- | Make a new empty vector with the given capacity. O(n)
newEmpty :: (Default a, PrimMonad m) => Int -> m (GrowVector (PrimState m) a)
newEmpty cap = do
  arr <- newArray cap def
  GrowVector <$> newMutVar arr <*> newMutVar 0 <*> newMutVar cap

-- | Read an element in the vector (unchecked). O(1)
read :: PrimMonad m => GrowVector (PrimState m) a -> Int -> m a
g `read` i = do arr <- readMutVar (underlying g); arr `readArray` i

-- | Find the size of the vector. O(1)
size :: PrimMonad m => GrowVector (PrimState m) a -> m Int
size g = readMutVar (length g)

-- | Double the vector capacity. O(n)
resize :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> m ()
resize g = do
  curCap <- readMutVar (capacity g)         -- read current capacity
  curArr <- readMutVar (underlying g)       -- read current array
  curLen <- readMutVar (length g)           -- read current length
  newArr <- newArray (2 * curCap) def       -- allocate a new array twice as big
  copyMutableArray newArr 1 curArr 1 curLen -- copy the old array over
  underlying g `writeMutVar` newArr         -- use the new array in the vector
  capacity g `modifyMutVar'` (*2)           -- update the capacity in the vector

-- | Write an element to the array (unchecked). O(1)
write :: PrimMonad m => GrowVector (PrimState m) a -> Int -> a  -> m ()
write g i x = do arr <- readMutVar (underlying g); writeArray arr i x

-- | Pop an element of the vector, mutating it (unchecked). O(1)
popBack :: PrimMonad m => GrowVector (PrimState m) a -> m a
popBack g = do
  s <- size g;
  x <- g `read` (s - 1)
  length g `modifyMutVar'` (+ negate 1)
  pure x

-- | Push an element. (Amortized) O(1)
pushBack :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> a -> m ()
pushBack g x = do
  s <- readMutVar (length g)                -- read current size
  c <- readMutVar (capacity g)              -- read current capacity
  when (s+1 == c) (resize g)                -- if need be, resize
  write g (s+1) x                           -- write to the back of the array
  length g `modifyMutVar'` (+1)             -- increase te length

Current semantics of `grow`

I think the github issue does a pretty good job of explaining the semantics:

I think the intended semantics are that it may do a realloc, but not guaranteed to, and all the current implementations do the simpler copying semantics because for on heap allocations the cost should be roughly the same.

Basically you should use grow when you want a new mutable vector of an increased size, starting with the elements of the old vector (and no longer care about the old vector). This is quite useful - for example one could implement GrowVector using MVector and grow.

¹ the approach is that for every new type of unboxed vector you want to have, you make a data instance that "expands" your type into a fixed number of unboxed arrays (or other unboxed vectors). This is the point of data family - to allow different instantiations of a type to have totally different runtime representations, and to also be extensible (you can add your own data instance if you want).

answered Sep 19 '22 04:09

Alec

Related questions
                            
                                Haskell counted list type
                            
                                Why does nested forever leak memory?
                            
                                Is there any good Haskell indentation script for Vim?
                            
                                Closed type families and type inference in Haskell
                            
                                Mutable Array in GHC Compact Region
                            
                                Round NaN in Haskell
                            
                                Haskell record syntax and type classes
                            
                                Confusion regarding a passage in the "Kinds and some type-foo" section of learnyouahaskell.com
                            
                                If a thunk results in an exception, is the exception kept as the result of the thunk?
                            
                                Statically linking a C library with a Haskell library
                            
                                Do maps with list keys form a monad?
                            
                                What is the haskell way to copy a directory
                            
                                Free Monad of a Monad
                            
                                What are the differences between layers and extensible-effects?
                            
                                Why does the pointfree version of my function use much more memory
                            
                                The case of the disappearing constraint: Oddities of a higher-rank type
                            
                                How does the line with two backslashes fool C preprocessor?
                            
                                Are there any special challenges for functional programming in an embedded environment?
                            
                                How are `ghc-pkg` and `cabal` programs related? (Haskell)
                            
                                Do Accelerate and Repa have different use cases?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Haskell vector C++ push_back analogue

Tags:

data-structures

haskell

vector

amortized-analysis

Aleksandr Pakhomov

People also ask

1 Answers

What do I do then?

Making something like C++ `vector`

Current semantics of `grow`

Alec

Recent Activity

Donate For Us

Haskell vector C++ push_back analogue

Tags:

data-structures

haskell

vector

amortized-analysis

Aleksandr Pakhomov

People also ask

1 Answers

What do I do then?

Making something like C++ vector

Current semantics of grow

Alec

Related questions

Recent Activity

Donate For Us

Making something like C++ `vector`

Current semantics of `grow`