Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Data Structure for The Following Constraints?

Here are some constraints for a data structure I need. It seems like none of the common data structures (I will mention the ones I've thought of below) fit these all that well. Can anyone suggest one that I maybe haven't thought of?

  1. I need to be able to perform lookups by unsigned integer keys.
  2. The items to be stored are user-defined structs.
  3. These indices will be sparse, usually extremely so. Regular arrays are out.
  4. The frequency of each index will have a non-uniform distribution, with small indices being much more frequent than large indices.
  5. N will usually be small, probably no larger than 5 or 10, but I don't want to rely on that too heavily because it might occasionally be much larger.
  6. The constant term matters a lot. I need really fast lookups when N is small. I already tried generic hash tables and, empirically, they are too slow, even when N=1, meaning no collisions, probably because of the amount of indirection involved. However, I'd be open to suggestions about specialized hash tables that take advantage of other constraints mentioned.
  7. Insertion time is not important as long as retrieval time is fast. Even O(N) insertion time is good enough.
  8. Space efficiency is not terribly important, though it is important enough not to just use regular arrays.
like image 457
dsimcha Avatar asked Mar 01 '09 20:03

dsimcha


People also ask

Which of the following is the best data structure?

An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays.

What is the best data structure which can be used to store the collection of strings?

Trie. A trie, also known as a keyword tree, is a data structure that stores strings as data items that can be organized in a visual graph.

Which data structure is best for deletion?

To answer your question, if time to delete is the most important perspective, the LinkedList should be chosen.


1 Answers

When N is small a simple array or single linked list with key + value as payload is very efficient. Even if it is not the best when N gets larger.

You get O(N) lookup time which means lookups take k * N time. A O(1) lookup takes a constant K time. So you get better performance with O(N) for N < K/k. Here k is very small so you can get to interesting values of N. Remember that the Big O notation only describes behavior for large Ns, not what you are after. For small tables

void *lookup(int key_to_lookup)
{
  int n = 0;
  while (table_key[n] != key_to_lookup)
    n++;
  return table_data[n];
}

can be hard to beat.

Benchmark your hash tables, balanced tree and simple array/linked list and see at which values of N they each start to be better. Then you will know which is better for you.

I almost forgot: keep the frequently accessed keys at the beginning of your array. Given your description that means keep it sorted.

like image 86
kmkaplan Avatar answered Sep 21 '22 21:09

kmkaplan