Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I further optimize this Data Structure?

I was recently asked to build a data structure that supports four operations, namely,

  1. Push: Add an element to the DS.
  2. Pop: Remove the last pushed element.
  3. Find_max: Find the maximum element out of the currently stored elements.
  4. Pop_max: Remove the maximum element from the DS.

The elements are integers.

Here is the solution I suggested:

  1. Take a stack.
  2. Store a pair of elements in it. The pair should be (element, max_so_far), where element is the element at that index and max_so_far is the maximum valued element seen so far.
  3. While pushing an element into the stack, check the max_so_far of the topmost stack element. If current number is greater than that, put the current pair's max_so_far value as the current element's value, else store the previous max_so_far. This mean that pushing would simply be an O(1) operation.
  4. For pop, simply pop an element out of the stack. Again, this operation is O(1).
  5. For Find_max, return the value of the max_so_far of the topmost element in the stack. Again, O(1).
  6. Popping the max element would involve going through the stack and explicitly removing the max element and pushing back the elements on top of it, after allotting new max_so_far values. This would be linear.

I was asked to improve it, but I couldn't.

In terms of time complexity, the overall time can be improved if all operations happen in O(logn), I guess. How to do that, is something I'm unable to get.

like image 344
Ranveer Avatar asked Sep 17 '14 17:09

Ranveer


People also ask

What is data structure optimization?

Starting from the baseline, the major data structure optimization is to flatten the tree object into an 1D array (Fig. 4). In this way, the tree traversals with object data access through member functions are converted into streaming memory accesses to a raw data array.

How do you create a data structure?

Select Data Structure on the Add Object form and click the OK button. Enter the name, description, and product code of a data structure. For a regular data structure, select Regular Data Structure. Complete the Add Object form and select Regular Data Structure, then click the OK button.

What data structure is most suitable for?

An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays.


3 Answers

One way to get O(log n)-time operations is to mash up two data structures, in this case a doubly linked list and a priority queue (a pairing heap is a good choice) . We have a node structure like

struct Node {
    Node *previous, *next;  // doubly linked list
    Node **back, *child, *sibling;  // pairing heap
    int value;
} list_head, *heap_root;

Now, to push, we push in both structures. To find_max, we return the value of the root of the pairing heap. To pop or pop_max, we pop from the appropriate data structure and then use the other node pointers to delete in the other data structure.

like image 32
David Eisenstat Avatar answered Sep 19 '22 23:09

David Eisenstat


One approach would be to store pointers to the elements in a doubly-linked list, and also in a max-heap data structure (sorted by value).

Each element would store its position in the doubly-linked list and also in the max-heap.

In this case all of your operations would require O(1) time in the doubly-linked list, plus O(log(n)) time in the heap data structure.

like image 179
Peter de Rivaz Avatar answered Sep 19 '22 23:09

Peter de Rivaz


Usually, when you need to find elements by quality A (value), and also by quality B (insert order), then you start eyeballing a data structure that actually has two data structures inside that reference each other, or are otherwise interleaved.

For instance: two maps that who's keys are quality A and quality B, who's values are a shared pointer to a struct that contains iterators back to both maps, and the value. Then you have log(n) to find an element via either quality, and erasure is ~O(logn) to remove the two iterators from either map.

struct hybrid {
    struct value {
        std::map<std::string, std::shared_ptr<value>>::iterator name_iter;
        std::map<int, std::shared_ptr<value>>::iterator height_iter;
        mountain value;
    };
    std::map<std::string, std::shared_ptr<value>> name_map;
    std::map<int, std::shared_ptr<value>> height_map;

    mountain& find_by_name(std::string s) {return name_map[s]->value;}
    mountain& find_by_height(int height h) {return height_map[s]->value;}
    void erase_by_name(std::string s) {
        value& v = name_map[s];
        name_map.erase(v.name_iter);
        height_iter.erase(v.height_iter); //note that this invalidates the reference v
    }
};

However, in your case, you can do even better than this O(logn), since you only need "the most recent" and "the next highest". To make "pop highest" fast, you need a fast way to detect the next highest, which means that needs to be precalculated at insert. To find the "height" position relative to the rest, you need a map of some sort. To make "pop most recent" fast, you need a fast way to detect the next most recent, but that's trivially calculated. I'd recommend creating a map or heap of nodes, where keys are the value for finding the max, and the values are a pointer to the next most recent value. This gives you O(logn) insert, O(1) find most recent, O(1) or O(logn) find maximum value (depending on implementation), and ~O(logn) erasure by either index.

like image 38
Mooing Duck Avatar answered Sep 20 '22 23:09

Mooing Duck