I was recently asked to build a data structure that supports four operations, namely, <ol> <li>Push: Add an element to the DS.</li> <li>Pop: Remove the last pushed element.</li> <li>Find_max: Find the maximum element out of the currently stored elements.</li> <li>Pop_max: Remove the maximum element from the DS.</li> </ol> The elements are integers. Here is the solution I suggested: <ol> <li>Take a stack.</li> <li>Store a pair of elements in it. The pair should be (element, max_so_far), where element is the element at that index and max_so_far is the maximum valued element seen so far.</li> <li>While pushing an element into the stack, check the max_so_far of the topmost stack element. If current number is greater than that, put the current pair's max_so_far value as the current element's value, else store the previous max_so_far. This mean that pushing would simply be an <code>O(1)</code> operation.</li> <li>For <code>pop</code>, simply pop an element out of the stack. Again, this operation is <code>O(1)</code>.</li> <li>For <code>Find_max</code>, return the value of the max_so_far of the topmost element in the stack. Again, <code>O(1)</code>.</li> <li>Popping the max element would involve going through the stack and explicitly removing the max element and pushing back the elements on top of it, after allotting new max_so_far values. This would be linear.</li> </ol> I was asked to improve it, but I couldn't. In terms of time complexity, the overall time can be improved if all operations happen in <code>O(logn)</code>, I guess. How to do that, is something I'm unable to get.

Usually, when you need to find elements by quality A (value), and also by quality B (insert order), then you start eyeballing a data structure that actually has two data structures inside that reference each other, or are otherwise interleaved. For instance: two maps that who's keys are quality A and quality B, who's values are a shared pointer to a struct that contains iterators back to both maps, and the value. Then you have log(n) to find an element via either quality, and erasure is ~O(logn) to remove the two iterators from either map. <pre class="prettyprint"><code>struct hybrid { struct value { std::map<std::string, std::shared_ptr<value>>::iterator name_iter; std::map<int, std::shared_ptr<value>>::iterator height_iter; mountain value; }; std::map<std::string, std::shared_ptr<value>> name_map; std::map<int, std::shared_ptr<value>> height_map; mountain& find_by_name(std::string s) {return name_map[s]->value;} mountain& find_by_height(int height h) {return height_map[s]->value;} void erase_by_name(std::string s) { value& v = name_map[s]; name_map.erase(v.name_iter); height_iter.erase(v.height_iter); //note that this invalidates the reference v } }; </code></pre> However, in your case, you can do even better than this O(logn), since you only need "the most recent" and "the next highest". To make "pop highest" fast, you need a fast way to detect the next highest, which means that needs to be precalculated at insert. To find the "height" position relative to the rest, you need a map of some sort. To make "pop most recent" fast, you need a fast way to detect the next most recent, but that's trivially calculated. I'd recommend creating a map or heap of nodes, where keys are the value for finding the max, and the values are a pointer to the next most recent value. This gives you O(logn) insert, O(1) find most recent, O(1) or O(logn) find maximum value (depending on implementation), and ~O(logn) erasure by either index.

How do I further optimize this Data Structure?

Tags:

algorithm

time-complexity

data-structures

I was recently asked to build a data structure that supports four operations, namely,

Push: Add an element to the DS.
Pop: Remove the last pushed element.
Find_max: Find the maximum element out of the currently stored elements.
Pop_max: Remove the maximum element from the DS.

The elements are integers.

Here is the solution I suggested:

Take a stack.
Store a pair of elements in it. The pair should be (element, max_so_far), where element is the element at that index and max_so_far is the maximum valued element seen so far.
While pushing an element into the stack, check the max_so_far of the topmost stack element. If current number is greater than that, put the current pair's max_so_far value as the current element's value, else store the previous max_so_far. This mean that pushing would simply be an O(1) operation.
For pop, simply pop an element out of the stack. Again, this operation is O(1).
For Find_max, return the value of the max_so_far of the topmost element in the stack. Again, O(1).
Popping the max element would involve going through the stack and explicitly removing the max element and pushing back the elements on top of it, after allotting new max_so_far values. This would be linear.

I was asked to improve it, but I couldn't.

In terms of time complexity, the overall time can be improved if all operations happen in O(logn), I guess. How to do that, is something I'm unable to get.

344

asked Sep 17 '14 17:09

Ranveer

3 Answers

One way to get O(log n)-time operations is to mash up two data structures, in this case a doubly linked list and a priority queue (a pairing heap is a good choice) . We have a node structure like

struct Node {
    Node *previous, *next;  // doubly linked list
    Node **back, *child, *sibling;  // pairing heap
    int value;
} list_head, *heap_root;

Now, to push, we push in both structures. To find_max, we return the value of the root of the pairing heap. To pop or pop_max, we pop from the appropriate data structure and then use the other node pointers to delete in the other data structure.

answered Sep 19 '22 23:09

David Eisenstat

One approach would be to store pointers to the elements in a doubly-linked list, and also in a max-heap data structure (sorted by value).

Each element would store its position in the doubly-linked list and also in the max-heap.

In this case all of your operations would require O(1) time in the doubly-linked list, plus O(log(n)) time in the heap data structure.

179

answered Sep 19 '22 23:09

Peter de Rivaz

Usually, when you need to find elements by quality A (value), and also by quality B (insert order), then you start eyeballing a data structure that actually has two data structures inside that reference each other, or are otherwise interleaved.

For instance: two maps that who's keys are quality A and quality B, who's values are a shared pointer to a struct that contains iterators back to both maps, and the value. Then you have log(n) to find an element via either quality, and erasure is ~O(logn) to remove the two iterators from either map.

struct hybrid {
    struct value {
        std::map<std::string, std::shared_ptr<value>>::iterator name_iter;
        std::map<int, std::shared_ptr<value>>::iterator height_iter;
        mountain value;
    };
    std::map<std::string, std::shared_ptr<value>> name_map;
    std::map<int, std::shared_ptr<value>> height_map;

    mountain& find_by_name(std::string s) {return name_map[s]->value;}
    mountain& find_by_height(int height h) {return height_map[s]->value;}
    void erase_by_name(std::string s) {
        value& v = name_map[s];
        name_map.erase(v.name_iter);
        height_iter.erase(v.height_iter); //note that this invalidates the reference v
    }
};

However, in your case, you can do even better than this O(logn), since you only need "the most recent" and "the next highest". To make "pop highest" fast, you need a fast way to detect the next highest, which means that needs to be precalculated at insert. To find the "height" position relative to the rest, you need a map of some sort. To make "pop most recent" fast, you need a fast way to detect the next most recent, but that's trivially calculated. I'd recommend creating a map or heap of nodes, where keys are the value for finding the max, and the values are a pointer to the next most recent value. This gives you O(logn) insert, O(1) find most recent, O(1) or O(logn) find maximum value (depending on implementation), and ~O(logn) erasure by either index.

answered Sep 20 '22 23:09

Mooing Duck

Related questions
                            
                                High Level Java Optimization
                            
                                Optimising partial dictionary key match
                            
                                Is there any particular reason why Eclipse generated equals uses the values of 1231 and 1237 for booleans?
                            
                                Impossible for me to understand a method of string search as described. What is uFFFF?
                            
                                Second Algorithm Solution to Readers-Writer
                            
                                Fastest algorithm to find a string in an array of strings?
                            
                                Number of combinations with LEGO plastic bricks C++
                            
                                Is there a preexisting function that will return a set of numbers based on a base number and an "offset"?
                            
                                Why chose 31 to do the multiplication in the hashcode() implementation ? [duplicate]
                            
                                Given Two Lists of Integers, Find Each Pair Within a Distance of Each Other < O(N^2)
                            
                                K-th element in a heap tree
                            
                                Find the k largest elements in order
                            
                                Implementing Autocomplete in iOS
                            
                                method to find the shortest substring containing the given words:optimization required
                            
                                How to reflect a line over another line
                            
                                Algorithm Solving Issue
                            
                                sine wave that exponentialy changes between frequencies f1 and f2 at given time/amount of samples
                            
                                Algorithm for finding the segment overlapping two collinear segments
                            
                                Find duplicate in Array with single loop
                            
                                How to check ASP.NET password hash in node.js

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With