Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently find a contiguous range of used/free slots from a Fenwick tree

Assume, that I am tracking the usage of slots in a Fenwick tree. As an example, lets consider tracking 32 slots, leading to a Fenwick tree layout as shown in the image below, where the numbers in the grid indicate the index in the underlying array with counts manipulated by the Fenwick tree where the value in each cell is the sum of "used" items in that segment (i.e. array cell 23 stores the amount of used slots in the range [16-23]). The items at the lowest level (i.e. cells 0, 2, 4, ...) can only have the value of "1" (used slot) or "0" (free slot).

Example Fenwick tree layout

What I am looking for is an efficient algorithm to find the first range of a given number of contiguous free slots.

To illustrate, suppose I have the Fenwick tree shown in the image below in which a total of 9 slots are used (note that the light gray numbers are just added for clarity, not actually stored in the tree's array cells).

Example tree

Now I would like to find e.g. the first contiguous range of 10 free slots, which should find this range:

Example search result

I can't seem to find an efficient way of doing this, and it is giving me a bit of a headache. Note, that as the required amount of storage space is critical for my purposes, I do not wish to extend the design to be a segment tree.

Any thoughts and suggestions on an O(log N) type of solution would be very welcome.

EDIT

Time for an update after bounty period has expired. Thanks for all comments, questions, suggestions and answers. They have made me think things over again, taught me a lot and pointed out to me (once again; one day I may learn this lesson) that I should focus more on the issue I want to solve when asking questions.

Since @Erik P was the only one that provided a reasonable answer to the question that included the requested code/pseudo code, he will receive the bounty.

He also pointed out correctly that O(log N) search using this structure is not going to be possible. Kudos to @DanBjorge for providing a proof that made me think about worst case performance.

The comment and answer of @EvgenyKluev made me realize I should have formulated my question differently. In fact I was already doing in large part what he suggested (see https://gist.github.com/anonymous/7594508 - which shows where I got stuck before posting this question), and asked this question hoping there would be an efficient way to search contiguous ranges, thereby preventing changing this design to a segment tree (which would require an additional 1024 bytes). It appears however that such a change might be the smart thing to do.

For anyone interested, a binary encoded Fenwick tree matching the example used in this question (32 slot fenwick tree encoded in 64 bits) can be found here: https://gist.github.com/anonymous/7594245.

like image 892
Alex Avatar asked Nov 12 '13 18:11

Alex


4 Answers

I think the easiest way to implement all the desired functionality with O(log N) time complexity and at the same time minimize memory requirements is using a bit vector to store all 0/1 (free/used) values. Bit vector can substitute 6 lowest levels of both Fenwick tree and segment tree (if implemented as 64-bit integers). So height of these trees may be reduced by 6 and space requirements for each of these trees would be 64 (or 32) times less than usual.

Segment tree may be implemented as implicit binary tree sitting in an array (just like a well-known max-heap implementation). Root node at index 1, each left descendant of node at index i is placed at 2*i, each right descendant - at 2*i+1. This means twice as much space is needed comparing to Fenwick tree, but since tree height is cut by 6 levels, that's not a big problem.

Each segment tree node should store a single value - length of the longest contiguous sequence of "free" slots starting at a point covered by this node (or zero if no such starting point is there). This makes search for the first range of a given number of contiguous zeros very simple: start from the root, then choose left descendant if it contains value greater or equal than required, otherwise choose right descendant. After coming to some leaf node, check corresponding word of bit vector (for a run of zeros in the middle of the word).

Update operations are more complicated. When changing a value to "used", check appropriate word of bit vector, if it is empty, ascend segment tree to find nonzero value for some left descendant, then descend the tree to get to rightmost leaf with this value, then determine how newly added slot splits "free" interval into two halves, then update all parent nodes for both added slot and starting node of the interval being split, also set a bit in the bit vector. Changing a value to "free" may be implemented similarly.

If obtaining the number of nonzero items in some range is also needed, implement Fenwick tree over the same bit vector (but separate from the segment tree). There is nothing special in Fenwick tree implementation except that adding together 6 lowest nodes is substituted by "population count" operation for some word of the bit vector. For an example of using Fenwick tree together with bit vector see first solution for Magic Board on CodeChef.

All necessary operations for bit vector may be implemented pretty efficiently using various bitwise tricks. For some of them (leading/trailing zero count and population count) you could use either compiler intrinsics or assembler instructions (depending on target architecture).

If bit vector is implemented with 64-bit words and tree nodes - with 32-bit words, both trees occupy 150% space in addition to bit vector. This may be significantly reduced if each leaf node corresponds not to a single bit vector word, but to a small range (4 or 8 words). For 8 words additional space needed for trees would be only 20% of bit vector size. This makes implementation slightly more complicated. If properly optimized, performance should be approximately the same as in variant for one word per leaf node. For very large data set performance is likely to be better (because bit vector computations are more cache-friendly than walking the trees).

like image 102
Evgeny Kluev Avatar answered Oct 20 '22 07:10

Evgeny Kluev


As mcdowella suggests in their answer, let K2 = K/2, rounding up, and let M be the smallest power of 2 that is >= K2. A promising approach would be to search for contiguous blocks of K2 zeroes fully contained in one size-M block, and once we've found those, check neighbouring size-M blocks to see if they contain sufficient adjacent zeroes. For the initial scan, if the number of 0s in a block is < K2, clearly we can skip it, and if the number of 0s is >= K2 and the size of the block is >= 2*M, we can look at both sub-blocks.

This suggests the following code. Below, A[0 .. N-1] is the Fenwick tree array; N is assumed to be a power of 2. I'm assuming that you're counting empty slots, rather than nonempty ones; if you prefer to count empty slots, it's easy enough to transform from the one to the other.

initialize q as a stack data structure of triples of integers
push (N-1, N, A[n-1]) onto q
# An entry (i, j, z) represents the block [i-j+1 .. i] of length j, which
# contains z zeroes; we start with one block representing the whole array.
# We maintain the invariant that i always has at least as many trailing ones
# in its binary representation as j has trailing zeroes. (**)
initialize r as an empty list of pairs of integers
while q is not empty:
    pop an entry (i,j,z) off q
    if z < K2:
        next

    if FW(i) >= K:
        first_half := i - j/2
        # change this if you want to count nonempty slots:
        first_half_zeroes := A[first_half]
        # Because of invariant (**) above, first_half always has exactly
        # the right number of trailing 1 bits in its binary representation
        # that A[first_half] counts elements of the interval
        # [i-j+1 .. first_half].

        push (i, j/2, z - first_half_zeroes) onto q
        push (first_half, j/2, first_half_zeroes) onto q
    else:
        process_block(i, j, z)

This lets us process all size-M blocks with at least K/2 zeroes in order. You could even randomize the order in which you push the first and second half onto q in order to get the blocks in a random order, which might be nice to combat the situation where the first half of your array fills up much more quickly than the latter half.

Now we need to discuss how to process a single block. If z = j, then the block is entirely filled with 0s and we can look both left and right to add zeroes. Otherwise, we need to find out if it starts with >= K/2 contiguous zeroes, and if so with how many exactly, and then check if the previous block ends with a suitable number of zeroes. Similarly, we check if the block ends with >= K/2 contiguous zeroes, and if so with how many exactly, and then check if the next block starts with a suitable number of zeroes. So we will need a procedure to find the number of zeroes a block starts or ends with, possibly with a shortcut if it's at least a or at most b. To be precise: let ends_with_zeroes(i, j, min, max) be a procedure that returns the number of zeroes that the block from [i-j+1 .. j] ends with, with a shortcut to return max if the result will be more than max and min if the result will be less than min. Similarly for starts_with_zeroes(i, j, min, max).

def process_block(i, j, z):
    if j == z:
        if i > j:
            a := ends_with_zeroes(i-j, j, 0, K-z)
        else:
            a := 0

        if i < N-1:
            b := starts_with_zeroes(i+j, j, K-z-a-1, K-z-a)
        else:
            b := 0

        if b >= K-z-a:
            print "Found: starting at ", i - j - a + 1
        return

    # If the block doesn't start or end with K2 zeroes but overlaps with a
    # correct solution anyway, we don't need to find it here -- we'll find it
    # starting from the adjacent block.
    a := starts_with_zeroes(i, j, K2-1, j)
    if i > j and a >= K2:
        b := ends_with_zeroes(i-j, j, K-a-1, K-a)
        if b >= K-a:
            print "Found: starting at ", i - j - a + 1
        # Since z < 2*K2, and j != z, we know this block doesn't end with K2
        # zeroes, so we can safely return.
        return

    a := ends_with_zeroes(i, j, K2-1, j)
    if i < N-1 and a >= K2:
        b := starts_with_zeroes(i+j, K-a-1, K-a)
        if b >= K-a:
            print "Found: starting at ", i - a + 1

Note that in the second case where we find a solution, it may be possible to move the starting point left a bit further. You could check for that separately if you need the very first position that it could start.

Now all that's left is to implement starts_with_zeroes and ends_with_zeroes. In order to check that the block starts with at least min zeroes, we can test that it starts with 2^h zeroes (where 2^h <= min) by checking the appropriate Fenwick entry; then similarly check if it starts with 2^H zeroes where 2^H >= max to short cut the other way (except if max = j, it is trickier to find the right count from the Fenwick tree); then find the precise number.

def starts_with_zeroes(i, j, min, max):
    start := i-j

    h2 := 1
    while h2 * 2 <= min:
        h2 := h2 * 2
        if A[start + h2] < h2:
            return min
    # Now h2 = 2^h in the text.
    # If you insist, you can do the above operation faster with bit twiddling
    # to get the 2log of min (in which case, for more info google it).

    while h2 < max and A[start + 2*h2] == 2*h2:
        h2 := 2*h2
    if h2 == j:
        # Walk up the Fenwick tree to determine the exact number of zeroes
        # in interval [start+1 .. i]. (Not implemented, but easy.) Let this
        # number be z.

        if z < j:
            h2 := h2 / 2

    if h2 >= max:
        return max

    # Now we know that [start+1 .. start+h2] is all zeroes, but somewhere in 
    # [start+h2+1 .. start+2*h2] there is a one.
    # Maintain invariant: the interval [start+1 .. start+h2] is all zeroes,
    # and there is a one in [start+h2+1 .. start+h2+step].
    step := h2;
    while step > 1:
        step := step / 2
        if A[start + h2 + step] == step:
             h2 := h2 + step
    return h2

As you see, starts_with_zeroes is pretty bottom-up. For ends_with_zeroes, I think you'd want to do a more top-down approach, since examining the second half of something in a Fenwick tree is a little trickier. You should be able to do a similar type of binary search-style iteration.

This algorithm is definitely not O(log(N)), and I have a hunch that this is unavoidable. The Fenwick tree simply doesn't give information that is that good for your question. However, I think this algorithm will perform fairly well in practice if suitable intervals are fairly common.

like image 43
Erik P. Avatar answered Oct 20 '22 06:10

Erik P.


One quick check, when searching for a range of K contiguous slots, is to find the largest power of two less than or equal to K/2. Any K continuous zero slots must contain at least one Fenwick-aligned range of slots of size <= K/2 that is entirely filled with zeros. You could search the Fenwick tree from the top for such chunks of aligned zeros and then look for the first one that can be extended to produce a range of K contiguous zeros.

In your example the lowest level contains 0s or 1s and the upper level contains sums of descendants. Finding stretches of 0s would be easier if the lowest level contained 0s where you are currently writing 1s and a count of the number of contiguous zeros to the left where you are currently writing zeros, and the upper levels contained the maximum value of any descendant. Updating would mean more work, especially if you had long strings of zeros being created and destroyed, but you could find the leftmost string of zeros of length at least K with a single search to the left branching left where the max value was at least K. Actually here a lot of the update work is done creating and destroying runs of 1,2,3,4... on the lowest level. Perhaps if you left the lowest level as originally defined and did a case by case analysis of the effects of modifications you could have the upper levels displaying the longest stretch of zeros starting at any descendant of a given node - for quick search - and get reasonable update cost.

like image 2
mcdowella Avatar answered Oct 20 '22 05:10

mcdowella


@Erik covered a reasonable sounding algorithm. However, note that this problem has a lower complexity bound of Ω(N/K) in the worst-case.

Proof:

Consider a reduced version of the problem where:

  • N and K are both powers of 2
  • N > 2K >= 4

Suppose your input array is made up of (N/2K) chunks of size 2K. One chunk is of the form K 0s followed by K 1s, every other chunk is the string "10" repeated K times. There are (N/2K) such arrays, each with exactly one solution to the problem (the beginning of the one "special" chunk).

Let n = log2(N), k = log2(K). Let us also define the root node of the tree as being at level 0 and the leaf nodes as being at level n of the tree.

Note that, due to our array being made up of aligned chunks of size 2K, level n-k of the tree is simply going to be made up of the number of 1s in each chunk. However, each of our chunks has the same number of 1s in it. This means that every node at level n-k will be identical, which in turn means that every node at level <= n-k will also be identical.

What this means is that the tree contains no information that can disambiguate the "special" chunk until you start analyzing level n-k+1 and lower. But since all but 2 of the (N/K) nodes at that level are identical, this means that in the worst case you'll have to examine O(N/K) nodes in order to disambiguate the solution from the rest of the nodes.

like image 2
Dan Bjorge Avatar answered Oct 20 '22 05:10

Dan Bjorge