I've discovered that Haskell Data.Vector.*
miss C++ std::vector::push_back
's functionality. There is grow
/unsafeGrow
, but they seem to have O(n) complexity.
Is there a way to grow vectors in O(1) amortized time for an element?
Yes, std::vector<T>::push_back() creates a copy of the argument and stores it in the vector.
The push_back() function is used to insert an element at the end of a vector.
Vector is a Haskell library for working with arrays. It has an emphasis on very high performance through loop fusion, whilst retaining a rich interface. The main data types are boxed and unboxed arrays, and arrays may be immutable (pure), or mutable.
Numeric Haskell: A Vector Tutorial. Vector is a Haskell library for working with arrays. It has an emphasis on very high performance through loop fusion, whilst retaining a rich interface. The main data types are boxed and unboxed arrays, and arrays may be immutable (pure), or mutable.
vector::push_back () push_back () function is used to push elements into a vector from the back. The new value is inserted into the vector at the end, after the current last element and the container size is increased by 1.
vector::push_back() and vector::pop_back() in C++ STL. Vectors are same as dynamic arrays with the ability to resize itself automatically when an element is inserted or deleted, with their storage being handled automatically by the container. vector::push_back() push_back() function is used to push elements into a vector from the back.
You can create the arrays in many ways, for example, from a regular Haskell list: GHCi will print the contents of the vector as executable code. To create a multidimensional array, you can use a nested list generator to fill it: -- XXX TODO need a better printing function for multidimensional arrays.
No there really is no such facility in Data.Vector
. It isn't too difficult to implement this from scratch using MutableArray
like Data.Vector.Mutable
does (see my implementation below), but there are some significant drawbacks. In particular, all of its operations end up happening inside some state context usually ST
or IO
. This has the downsides that
vector
use something really clever called fusion to optimize away intermediate allocations. This sort of thing is not possible in a state context.ST
I can't even have two threads and in IO
I will have race conditions all over the place. The nasty bit here is that any sharing is going to have to happen in IO
.As if all this wasn't enough, garbage collection also performs better inside pure code.
It isn't particularly often that you have a need for exactly this behaviour - usually you are better off using an immutable data structure (thereby avoiding all of the aforementioned problems) which does something similar. Just limiting ourselves to containers
which comes with GHC, some alternatives include:
push_back
, maybe you just want a stack (a plain old [a]
).push_back
than lookups, Data.Sequence
gives you O(1)
appending to either end and O(log n)
lookup.Data.IntMap
is pretty optimized. Even if the theoretical cost of those operations is O(log n)
, you will need a pretty big IntMap
to start feeling those costs. vector
Of course, if one doesn't care about the restrictions mentioned initially, there is no reason not to have a C++ like vector. Just for fun, I went ahead and implemented this from scratch (needs packages data-default
and primitive
).
The reason this code is probably not already in some library is that it goes against much of the spirit of Haskell (I do this with the intent of conforming to a C++ style vector).
newVector
- everything else "modifies" an existing vector. Since pushBack
doesn't return a new GrowVector
, it has to modify the existing one (including its length and/or capacity), so length
and capacity
have to be "pointers". In turn, that means that even getting the length
is a monadic operation.vector
s data family
approach - it is just tedious1.With that said:
module GrowVector (
GrowVector, newEmpty, size, read, write, pushBack, popBack
) where
import Data.Primitive.Array
import Data.Primitive.MutVar
import Data.Default
import Control.Monad
import Control.Monad.Primitive (PrimState, PrimMonad)
import Prelude hiding (length, read)
data GrowVector s a = GrowVector
{ underlying :: MutVar s (MutableArray s a) -- ^ underlying array
, length :: MutVar s Int -- ^ perceived length of vector
, capacity :: MutVar s Int -- ^ actual capacity
}
type GrowVectorIO = GrowVector (PrimState IO)
-- | Make a new empty vector with the given capacity. O(n)
newEmpty :: (Default a, PrimMonad m) => Int -> m (GrowVector (PrimState m) a)
newEmpty cap = do
arr <- newArray cap def
GrowVector <$> newMutVar arr <*> newMutVar 0 <*> newMutVar cap
-- | Read an element in the vector (unchecked). O(1)
read :: PrimMonad m => GrowVector (PrimState m) a -> Int -> m a
g `read` i = do arr <- readMutVar (underlying g); arr `readArray` i
-- | Find the size of the vector. O(1)
size :: PrimMonad m => GrowVector (PrimState m) a -> m Int
size g = readMutVar (length g)
-- | Double the vector capacity. O(n)
resize :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> m ()
resize g = do
curCap <- readMutVar (capacity g) -- read current capacity
curArr <- readMutVar (underlying g) -- read current array
curLen <- readMutVar (length g) -- read current length
newArr <- newArray (2 * curCap) def -- allocate a new array twice as big
copyMutableArray newArr 1 curArr 1 curLen -- copy the old array over
underlying g `writeMutVar` newArr -- use the new array in the vector
capacity g `modifyMutVar'` (*2) -- update the capacity in the vector
-- | Write an element to the array (unchecked). O(1)
write :: PrimMonad m => GrowVector (PrimState m) a -> Int -> a -> m ()
write g i x = do arr <- readMutVar (underlying g); writeArray arr i x
-- | Pop an element of the vector, mutating it (unchecked). O(1)
popBack :: PrimMonad m => GrowVector (PrimState m) a -> m a
popBack g = do
s <- size g;
x <- g `read` (s - 1)
length g `modifyMutVar'` (+ negate 1)
pure x
-- | Push an element. (Amortized) O(1)
pushBack :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> a -> m ()
pushBack g x = do
s <- readMutVar (length g) -- read current size
c <- readMutVar (capacity g) -- read current capacity
when (s+1 == c) (resize g) -- if need be, resize
write g (s+1) x -- write to the back of the array
length g `modifyMutVar'` (+1) -- increase te length
grow
I think the github issue does a pretty good job of explaining the semantics:
I think the intended semantics are that it may do a realloc, but not guaranteed to, and all the current implementations do the simpler copying semantics because for on heap allocations the cost should be roughly the same.
Basically you should use grow
when you want a new mutable vector of an increased size, starting with the elements of the old vector (and no longer care about the old vector). This is quite useful - for example one could implement GrowVector
using MVector
and grow
.
1 the approach is that for every new type of unboxed vector you want to have, you make a data instance
that "expands" your type into a fixed number of unboxed arrays (or other unboxed vectors). This is the point of data family
- to allow different instantiations of a type to have totally different runtime representations, and to also be extensible (you can add your own data instance
if you want).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With