Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generalizing the find-min/find-max stack to arbitrary order statistics?

In this earlier question, the OP asked for a data structure similar to a stack supporting the following operations in O(1) time each:

  • Push, which adds a new element atop the stack,
  • Pop, which removes the top element from the stack,
  • Find-Max, which returns (but does not remove) the largest element of the stack, and
  • Find-Min, which returns (but does not remove) the smallest element of the stack.

A few minutes ago I found this related question asking for a clarification on a similar data structure that instead of allowing for the max and min to be queried, allows for the median element of the stack to be queried. These two data structures seem to be a special case of a more general data structure supporting the following operations:

  • Push, which pushes an element atop the stack,
  • Pop, which pops the top of the stack, and
  • Find-Kth, which for a fixed k determined when the structure is created, returns the kth largest element of the stack.

It is possible to support all of these operations by storing a stack and an balanced binary search tree holding the top k elements, which would enable all these operations to run in O(log k) time. My question is this: is it possible to implement the above data structure faster than this? That is, could we get O(1) for all three operations? Or perhaps O(1) for push and pop and O(log k) for the order statistic lookup?

like image 612
templatetypedef Avatar asked Aug 22 '11 08:08

templatetypedef


3 Answers

Since the structure can be used to sort k elements with O(k) push and find-kth operations, every comparison-based implementation has at least one of these cost Omega(log k), even in an amortized sense, with randomization.

Push can be O(log k) and pop/find-kth can be O(1) (use persistent data structures; push should precompute the order statistic). My gut feeling based on working with lower bounds for comparison-based algorithms is that O(1) push/pop and O(log k) find-kth is doable but requires amortization.

like image 58
tophat Avatar answered Nov 15 '22 18:11

tophat


I think what tophat was saying is, implement a purely functional data structure that supports only O(log k) insert and O(1) find-kth (cached by insert), and then make a stack of these structures. Push inserts into the top version and pushes the update, pop pops the top version, and find-kth operates on the top version. This is O(log k)/O(1)/(1) but super-linear space.

EDIT: I was working on O(1) push/O(1) pop/O(log k) find-kth, and I think it can't be done. The sorting algorithm that tophat referred to can be adapted to get √k evenly spaced order statistics of a length-k array in time O(k + (√k) log k). Problem is, the algorithm must know how each order statistic compares with all other elements (otherwise it might be wrong), which means that it has bucketed everything into one of √k + 1 buckets, which takes Ω(k log (√k + 1)) = Ω(k log k) comparisons on information theoretic grounds. Oops.

Replacing √k by keps for any eps > 0, with O(1) push/O(1) pop, I don't think find-kth can be O(k1 - eps), even with randomization and amortization.

like image 20
quaint Avatar answered Nov 15 '22 18:11

quaint


Whether this is actually faster than your log k implementation, depending on which operations are used most frequently, I propose an implementation with O(1) Find-kth and Pop and O(n) Push, where n is the stack size. And I also want to share this with SO because it is just a hilarious data structure at first sight, but might even be reasonable.

It's best described by a doubly doubly linked stack, or perhaps more easily dscribed as a hybrid of a linked stack and a doubly linked sorted list. Basically each node maintains 4 references to other nodes, the next and previous in stack order and the next and previous in sorted order on the element size. These two linked lists can be implemented using the same nodes, but they work completely seperately, i.e. the sorted linked list doesn't have to know about the stack order and vice versa.

Like a normal linked stack, the collection itself will need to maintain a reference to the top node (and to the bottom?). To accomodate the O(1) nature of the Find-kth method, the collection will also keep a reference to the kth largest element.

The pop method works as follows: The popped node gets removed from the sorted doubly linked list, just like a removal from a normal sorted linked list. It takes O(1) as the collection has a reference to the top. Depending on whether the popped element was larger or smaller than the kth element, the reference to the kth largest element is set to either the previous or the next. So the method still has O(1) complexity.

The push method works just like a normal addition to a sorted linked list, which is a O(n) operation. It start with the smallest element, and inserts the new node when a larger element is encountered. To maintain the correct reference to the kth largest element, again either the previous or next element to the current kth largest element is selected, depending on whether the pushed node was larger or smaller than the kth largest element.

Of course next to this, the reference to the 'top' of the stack has to be set in both methods. Also there's the problem of k > n, for which you haven't specified what the data structure should do. I hope it is clear how it works, otherwise I could add an example.

But ok, not entirely the complexity you had hoped for, but I find this an interesting 'solution'.


Edit: An implementation of the described structure

A bounty was issued on this question, which indicates my original answer wasn’t good enough:P Perhaps the OP would like to see an implementation?

I have implemented both the median problem and the fixed-k problem, in C#. The implementation of the tracker of the median is just a wrapper around the tracker of the kth element, where k can mutate.

To recap the complexities:

  • Push takes O(n)
  • Pop takes O(1)
  • FindKth takes O(1)
  • Change k takes O(delta k)

I have already described the algorithm in reasonable detail in my original post. The implementation is then fairly straightforward(but not so trivial to get right, as there are a lot of inequality signs and if statements to consider). I have commented only to indicate what is done, but not the details of how, as it would otherwise become too large. The code is already quite lengthy for a SO post.

I do want to provide the contracts of all non-trivial public members:

  • K is the index of the element in the sorted linked list to keep a reference too. Is it mutable and when set, the structure is immediately corrected for that.
  • KthValue is the value at that index, unless the structure doesn’t have k elements yet, in which case it returns a default value.
  • HasKthValue exists to easily distinguish these default values from elements which happened to be the default value of its type.
  • Constructors: a null enumerable is interpreted as an empty enumerable, and a null comparer is interpreted as the default. This comparer defines the order used when determining the kth value.

So this is the code:

public sealed class KthTrackingStack<T>
{
    private readonly Stack<Node> stack;
    private readonly IComparer<T> comparer;
    private int k;
    private Node smallestNode;
    private Node kthNode;

    public int K
    {
        get { return this.k; }
        set
        {
            if (value < 0) throw new ArgumentOutOfRangeException();
            for (; k < value; k++)
            {
                if (kthNode.NextInOrder == null)
                    return;
                kthNode = kthNode.NextInOrder;
            }
            for (; k >= value; k--)
            {
                if (kthNode.PreviousInOrder == null)
                    return;
                kthNode = kthNode.PreviousInOrder;
            }
        }
    }
    public T KthValue
    {
        get { return HasKthValue ? kthNode.Value : default(T); }
    }
    public bool HasKthValue
    {
        get { return k < Count; }
    }
    public int Count
    {
        get { return this.stack.Count; }
    }

    public KthTrackingStack(int k, IEnumerable<T> initialElements = null, IComparer<T> comparer = null)
    {
        if (k < 0) throw new ArgumentOutOfRangeException("k");
        this.k = k;
        this.comparer = comparer ?? Comparer<T>.Default;
        this.stack = new Stack<Node>();
        if (initialElements != null)
            foreach (T initialElement in initialElements)
                this.Push(initialElement);
    }
    public void Push(T value)
    {
        //just a like a normal sorted linked list should the node before the inserted node be found.
        Node nodeBeforeNewNode;
        if (smallestNode == null || comparer.Compare(value, smallestNode.Value) < 0)
            nodeBeforeNewNode = null;
        else
        {
            nodeBeforeNewNode = smallestNode;//untested optimization: nodeBeforeNewNode = comparer.Compare(value, kthNode.Value) < 0 ? smallestNode : kthNode;
            while (nodeBeforeNewNode.NextInOrder != null && comparerCompare(value, nodeBeforeNewNode.NextInOrder.Value) > 0)
                nodeBeforeNewNode = nodeBeforeNewNode.NextInOrder;
        }
        //the following code includes the new node in the ordered linked list
        Node newNode = new Node
                        {
                            Value = value,
                            PreviousInOrder = nodeBeforeNewNode,
                            NextInOrder = nodeBeforeNewNode == null ? smallestNode : nodeBeforeNewNode.NextInOrder
                        };
        if (newNode.NextInOrder != null)
            newNode.NextInOrder.PreviousInOrder = newNode;
        if (newNode.PreviousInOrder != null)
            newNode.PreviousInOrder.NextInOrder = newNode;
        else
            smallestNode = newNode;
        //the following code deals with changes to the kth node due the adding the new node
        if (kthNode != null && comparer.Compare(value, kthNode.Value) < 0)
        {
            if (HasKthValue)
                kthNode = kthNode.PreviousInOrder;
        }
        else if (!HasKthValue)
        {
            kthNode = newNode;
        }
        stack.Push(newNode);
    }

    public T Pop()
    {
        Node result = stack.Pop();
        //the following code deals with changes to the kth node
        if (HasKthValue)
        {
            if (comparer.Compare(result.Value, kthNode.Value) <= 0)
                kthNode = kthNode.NextInOrder;
        }
    else if(kthNode.PreviousInOrder != null || Count == 0)
        {
            kthNode = kthNode.PreviousInOrder;
        }
        //the following code maintains the order in the linked list
        if (result.NextInOrder != null)
            result.NextInOrder.PreviousInOrder = result.PreviousInOrder;
        if (result.PreviousInOrder != null)
            result.PreviousInOrder.NextInOrder = result.NextInOrder;
        else
            smallestNode = result.NextInOrder;
        return result.Value;
    }
    public T Peek()
    {
        return this.stack.Peek().Value;
    }
    private sealed class Node
    {
        public T Value { get; set; }
        public Node NextInOrder { get; internal set; }
        public Node PreviousInOrder { get; internal set; }
    }
}
public class MedianTrackingStack<T>
{
    private readonly KthTrackingStack<T> stack;
    public void Push(T value)
    {
        stack.Push(value);
        stack.K = stack.Count / 2;
    }
    public T Pop()
    {
        T result = stack.Pop();
        stack.K = stack.Count / 2;
        return result;
    }

    public T Median
    {
        get { return stack.KthValue; }
    }
    public MedianTrackingStack(IEnumerable<T> initialElements = null, IComparer<T> comparer = null)
    {
        stack = new KthTrackingStack<T>(initialElements == null ? 0 : initialElements.Count()/2, initialElements, comparer);
    }
}

Of course you're always free to ask any question about this code, as I realize some things may not be obvious from the description and sporadic comments

like image 34
JBSnorro Avatar answered Nov 15 '22 18:11

JBSnorro