I'm playing around with the idea of a compact stack—one whose space requirements approach that of an array as its size increases. A candidate structure:
data Stack a
= Empty
| Zero (Stack a)
| One !(SmallArray a) (Stack a)
| Two !(SmallArray a) !(SmallArray a) (Stack a)
-- Invariant: the array size at depth `n` is `2^n`.
push :: a -> Stack a -> Stack a
push = pushA . pure
pushA :: SmallArray a -> Stack a -> Stack a
pushA sa Empty = One sa Empty
pushA sa (Zero more) = One sa more
pushA sa1 (One sa2 more) = Two sa1 sa2 more
pushA sa1 (Two sa2 sa3 more) = One sa1 (pushA (sa2 <> sa3) more)
pop :: Stack a -> Maybe (a, Stack a)
pop stk = do
(sa, stk') <- popA stk
hd <- indexSmallArrayM sa 0
Just (hd, stk')
popA :: Stack a -> Maybe (SmallArray a, Stack a)
popA Empty = Nothing
popA (Zero more) = do
(sa, more') <- popA more
let !(sa1, sa2) = -- split sa in two
Just (sa1, One sa2 more')
popA (One sa more) = Just (sa, Zero more)
popA (Two sa1 sa2 more) = Just (sa1, One sa2 more)
Some numerical experimentation suggests that I can get an O(log n)
average cost per operation for a sequence of n
pushes. But is it possible to analyze this structure as having O(log n)
cost per push or pop? Or if not, can this be done for a similar structure? I haven't been able to find an appropriate debit invariant. The tricky case seems to be a sequence of Two
nodes followed by a One
node, but I may just be approaching this all wrong.
I believe I've figured out a way. The number system I suggested in the question turns out not to be the right one; it doesn't support O(log n)
pop
(or at least does not do so simply). We can patch this up by switching from 0/1/2 redundant binary to 1/2/3 redundant binary.
-- Note the lazy field in the Two constructor.
data Stack a
= Empty
| One !(SmallArray a) !(Stack a)
| Two !(SmallArray a) !(SmallArray a) (Stack a)
| Three !(SmallArray a) !(SmallArray a) !(SmallArray a) !(Stack a)
push :: a -> Stack a -> Stack a
push = pushA . pure
pushA :: SmallArray a -> Stack a -> Stack a
pushA sa Empty = One sa Empty
pushA sa1 (One sa2 more) = Two sa1 sa2 more
pushA sa1 (Two sa2 sa3 more) = Three sa1 sa2 sa3 more
pushA sa1 (Three sa2 sa3 sa4 more) = Two sa1 sa2 (pushA (sa3 <> sa4) more)
pop :: Stack a -> Maybe (a, Stack a)
pop stk = do
ConsA sa stk' <- pure $ popA stk
hd <- indexSmallArrayM sa 0
Just (hd, stk')
data ViewA a = EmptyA | ConsA !(SmallArray a) (Stack a)
popA :: Stack a -> ViewA a
popA Empty = EmptyA
popA (Three sa1 sa2 sa3 more) = ConsA sa1 (Two sa2 sa3 more)
popA (Two sa1 sa2 more) = ConsA sa1 (One sa2 more)
popA (One sa more) = ConsA sa $
case popA more of
EmptyA -> Empty
ConsA sa1 more' -> Two sa2 sa3 more'
where
len' = sizeofSmallArray sa1 `quot` 2
sa2 = cloneSmallArray sa1 0 len'
sa3 = cloneSmallArray sa1 len' len'
The first important step in proving this has the desired amortized bounds is to choose a debit invariant[*]. This had me stuck for quite a while, but I think I've got it.
Debit invariant: We allow the lazy Stack
in a Two
node as many debits as there are elements stored in that and all earlier Two
nodes.
push
and pop
run in O(log n)
amortized time.
We consider each of the cases in turn.
Empty
is always trivial.
One
: We increase the debit allowance below.
Two
: We reduce the debit allowance of nodes below by 1 unit. We pay O(log n)
to discharge the excess debits.
Three
: This is the tricky case for push
. We have some number of Three
nodes followed by something else. For each Three
node, we suspend s
array-doubling work. We pay for that using the additional debit allowance we gain from the elements in the new Two
node. When we reach the end of the Three
chain, we need to do something a bit funny. We may need the full debit allowance below, so we use debit passing to spread the debits for the final array append across all the earlier nodes.
At the end, we have either Empty
, One
, or Two
. If we have Empty
or One
, we're done. If we have Two
, then changing that to Three
reduces the debit allowance below. But we also gain debit allowance below, from all the Three
s that have changed to Two
s! Our net loss debit allowance is just 1, so we're golden.
We again proceed by cases.
Empty
is trivial.Three
: we increase the debit allowance below.Two
: We reduce the debit allowance on certain nodes by 1 unit; pay O(log n) to discharge the excess debits.One
: This is the hard case. We have some number of One
nodes followed by something else. For each One
, we perform a split. We place debits to pay for those, discharging the ones at the root. At the end, we have a situation similar to that for push
: the tricky case is ending in Two
, where we use the fact that all the new Two
s pay for the loss of the final Two
.One might worry that enough thunks could accumulate in the structure to negate the compactness of the array-based representation. Fortunately, this is not the case. A thunk can appear only on the Stack
in a Two
node. But any operation on that node will turn it into a One
or a Three
, forcing the Stack
. So thunks can never accumulate in chains, and we never have more than one thunk per node.
[*] Okasaki, C. (1998). Purely Functional Data Structures. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511530104, or read the relevant parts of his thesis online.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With