Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Infinite lazy bitmap

I am trying to construct a lazy data structure that holds an infinite bitmap. I would like to support the following operations:

  1. true :: InfBitMap

    Returns an infinite bitmap of True, i.e. all positions should have value True.

  2. falsify :: InfBitMap -> [Int] -> InfBitMap

    Set all positions in the list to False. The list is possible infinite. For example, falsify true [0,2..] will return a list where all (and only) odd positions are True.

  3. check :: InfBitMap -> Int -> Bool

    Check the value of the index.

Here is what I could do so far.

-- InfBitMap will look like [(@), (@, @), (@, @, @, @)..]
type InfBitMap = [Seq Bool]

true :: InfBitMap
true = iterate (\x -> x >< x) $ singleton True

-- O(L * log N) where N is the biggest index in the list checked for later
-- and L is the length of the index list. It is assumed that the list is
-- sorted and unique.
falsify :: InfBitMap -> [Int] -> InfBitMap
falsify ls is = map (falsify' is) ls
     where
         -- Update each sequence with all indices within its length
         -- Basically composes a list of (update pos False) for all positions
         -- within the length of the sequence and then applies it.
         falsify' is l = foldl' (.) id
                                (map ((flip update) False)
                                     (takeWhile (< length l) is))
                         $ l
-- O(log N) where N is the index.
check :: InfBitMap -> Int -> Bool
check ls i = index (fromJust $ find ((> i) . length) ls) i

I am wondering if there is some Haskellish concept/data-structure that I am missing that would make my code more elegant / more efficient (constants do not matter to me, just order). I tried looking at Zippers and Lenses but they do not seem to help. I would like to keep the complexities of updates and checks logarithmic (maybe just amortized logarithmic).

Note: before someone suspects it, no this is not a homework problem!

Update:

It just occurred to me that check can be improved to:

-- O(log N) where N is the index.
-- Returns "collapsed" bitmap for later more efficient checks.
check :: InfBitMap -> Int -> (Bool, InfBitMap)
check ls i = (index l i, ls')
    where
        ls'@(l:_) = dropWhile ((<= i) . length) ls

Which can be turned into a Monad for code cleanliness.

like image 834
aelguindy Avatar asked Sep 02 '14 11:09

aelguindy


1 Answers

A slight variation on the well-known integer trie seems to be applicable here.

{-# LANGUAGE DeriveFunctor #-}

data Trie a = Trie a (Trie a) (Trie a) deriving (Functor)

true :: Trie Bool
true = Trie True true true

-- O(log(index))
check :: Trie a -> Int -> a
check t i | i < 0 = error "negative index"
check t i = go t (i + 1) where
    go (Trie a _ _) 1 = a
    go (Trie _ l r) i = go (if even i then l else r) (div i 2)

--O(log(index))
modify :: Trie a -> Int -> (a -> a) -> Trie a
modify t i f | i < 0 = error "negative index"
modify t i f = go t (i + 1) where
    go (Trie a l r) 1 = Trie (f a) l r
    go (Trie a l r) i | even i = Trie a (go l (div i 2)) r
    go (Trie a l r) i = Trie a l (go r (div i 2))

Unfortunately we can't use modify to implement falsify because we can't handle infinite lists of indices that way (all modifications have to be performed before an element of the trie can be inspected). Instead, we should do something more like a merge:

ascIndexModify :: Trie a -> [(Int, a -> a)] -> Trie a
ascIndexModify t is = go 1 t is where
    go _ t [] = t
    go i t@(Trie a l r) ((i', f):is) = case compare i (i' + 1) of
        LT -> Trie a (go (2*i) l ((i', f):is)) (go (2*i+1) r ((i', f):is))
        GT -> go i t is
        EQ -> Trie (f a) (go (2*i) l is) (go (2*i+1) r is)

falsify :: Trie Bool -> [Int] -> Trie Bool
falsify t is = ascIndexModify t [(i, const False) | i <- is]

We assume strictly ascending indices in is, since otherwise we would skip places in the trie or even get non-termination, for example in check (falsify t (repeat 0)) 1.

The time complexities are a bit complicated by laziness. In check (falsify t is) index, we pay an additional cost of a constant log 2 index number of comparisons, and a further length (filter (<index) is) number of comparisons (i. e. the cost of stepping over all indices smaller than what we're looking up). You could say it's O(max(log(index), length(filter (<index) is)). Anyway, it's definitely better than the O(length is * log (index)) that we would get for a falsify implemented for finite is-es using modify.

We must keep in mind that tree nodes are evaluated once, and subsequent check-s for the same index after the first check are not paying any extra cost for falsify. Again, laziness makes this a bit complicated.

This falsify is also pretty well-behaved when we want to traverse a prefix of a trie. Take this toList function:

trieToList :: Trie a -> [a]
trieToList t = go [t] where
    go ts = [a | Trie a _ _ <- ts] 
            ++ go (do {Trie _ l r <- ts; [l, r]})

It's a standard breadth-first traversal, in linear time. The traversal time remains linear when we compute take n $ trieToList (falsify t is), since falsify incurs at most n + length (filter (<n) is) extra comparisons, which is at most 2 * n, assuming strictly increasing is.

(side note: the space requirement of breadth-first traversal is rather painful, but I can't see a simple way to help it, since iterative deepening is even worse here, because there the whole tree must be held in memory, while bfs only has to remember the bottom level of the tree).

like image 134
András Kovács Avatar answered Oct 24 '22 17:10

András Kovács