Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip lists, are they really performing as good as Pugh paper claim?

Tags:

People also ask

What are Skip lists good for?

The skip list is used to store a sorted list of elements or data with a linked list. It allows the process of the elements or data to view efficiently. In one single step, it skips several elements of the entire list, which is why it is known as a skip list.

What is probabilistic skip list?

Skip lists are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space.

Are skip lists ordered?

A skip list starts with a basic, ordered, linked list. This list is sorted, but we can't do a binary search on it because it is a linked list and we cannot index into it.

Is skip list a randomized data structure?

We already know that a sorted array will allow Ο(log n) time searches via the Binary Search algorithm.


I'm trying to implement a skip list that perform as good as a BST using a minimum additional memory overhead, at the moment even not considering any memory restriction the performance of my SkipList implementation is far away from even a very naive Balanced BST implementation -so to say, handcrafted BTS :) -

As reference I'm using the original paper from William Pugh PUG89 and the implementation i found in Algorithms in C from Sedgewick -13.5-. My code is a recursive implementation, here's the clue of the insert and find operation:

sl_node* create_node() {     short lvl {1};     while((dist2(en)<p)&&(lvl<max_level))         ++lvl;     return new sl_node(lvl); }  void insert_impl(sl_node* cur_node,         sl_node* new_node,         short lvl) {     if(cur_node->next_node[lvl]==nullptr || cur_node->next_node[lvl]->value > new_node->value){         if(lvl<new_node->lvl){             new_node->next_node[lvl] = cur_node->next_node[lvl];             cur_node->next_node[lvl] = new_node;         }         if(lvl==0) return;         insert_impl(cur_node,new_node,lvl-1);         return;     }     insert_impl(cur_node->next_node[lvl],new_node,lvl); } sl_node* insert(long p_val) {     sl_node* new_node = create_node();     new_node->value = p_val;     insert_impl(head, new_node,max_level-1);     return new_node; } 

And this is the code for the find operation:

sl_node* find_impl(sl_node* cur_node,         long p_val,         int lvl) {     if(cur_node==nullptr) return nullptr;     if(cur_node->value==p_val) return cur_node;     if(cur_node->next_node[lvl] == nullptr || cur_node->next_node[lvl]->value>p_val){         if(lvl==0) return nullptr;         return find_impl(cur_node,p_val,lvl-1);     }     return find_impl(cur_node->next_node[lvl],p_val,lvl); }  sl_node* find(long p_val) {     return find_impl(head,p_val,max_level-1); } 

A sl_node -skip list node- looks like this:

struct sl_node {     long  value;     short lvl;     sl_node** next_node;      sl_node(int l) : lvl(l)     {         next_node = new sl_node*[l];         for(short i{0};i<l;i++)             next_node[i]=nullptr;     }     ~sl_node()     {         delete[] next_node;     } }; 

As you can see this is implementation has nothing special nor advanced if compared to the book implementation, i will not share the Balaced BTS code since I don't think is needed here, but is a very basic BTS with a rebalance function triggered during the insertion when the new node height is greather than 16*lg(n) where n is the number of nodes.

So to say, i rebalance the three only if the maximum height is 16 times greather than the best theoretical one, as i said, this straighforward homemade BST performs much better than the homemade Skip List.

But first, let's have a look at some data, using p=.5 and n=262144 the level of the nodes in the SkipList has the following distribution:

1:141439, 53.9547% 2:65153, 24.8539% 3:30119, 11.4895% 4:13703, 5.22728% 5:6363, 2.42729% 6:2895, 1.10435% 7:1374, 0.524139% 8:581, 0.221634% 9:283, 0.107956% 10:117, 0.044632% 11:64, 0.0244141% 12:31, 0.0118256% 13:11, 0.00419617% 14:5, 0.00190735% 15:1, 0.00038147% 16:5, 0.00190735% 

Which almost perfectly -oh, this is big one!- match the theory from the article, that is: 50% level 1, 25% level 2 and so on and so forth. The input data came from my best available pseudo-random number generator aka std::random_device with std::default_random_engine and an uniform int distribution. The input looks pretty random to me :) !

The time required to search for 'all' the 262144 elements in the SkipList -in random order- is 315ms on my machine, whereas for the same search operation on the naive BTS the required time is 134ms, so the BTS is almost twice faster than the SkipList. That is not exactly what i was expecting from "Skip list algoriths have the same asymptotic expected time bounds as balanced trees and are simple, faster and use less space" PUG89.

The time required for the 'insertion' of the nodes is 387ms for the SkipList and 143ms for the BTS, again a naive BST perform better.

The things get little more interesting if instead of using a random sequence of input number i use a sorted sequence, here my poor homemade BST become slow, and the insertion of 262144 sorted int's takes 2866ms whereas the SkipList require only 168ms.

But, when come to the search time, the BST is still faster! for the sorted input we have 234ms versus 77ms, this BST is 3x faster.

With different values of the p-factor i got slightly different performance results:

enter image description here

enter image description here

And last but not least, the memory usage graph, which as you may expect confirm that if we increase the amount of levels per node we impact significantly the memory fingerprint. The memory usage is calculated just as a sum of the space required to store the additional pointers for all the node.

enter image description here

After all this, does anybody of you can provide me a comment on how to implement a SkipList performing as good as a BTS -not counting the additional memory overhead-?

I know about the article from DrDobbs LINK about SkipList, and i went thru all the paper, the code for the search and insert operation match exactly the original implementation from PUG89 so should be as good as mine, and the article does not provide any performance analysis anyway, I did not found any other source. Can you help me?

Any comment is highly appreciated!

Thanks! :)