Here are some constraints for a data structure I need. It seems like none of the common data structures (I will mention the ones I've thought of below) fit these all that well. Can anyone suggest one that I maybe haven't thought of? <ol> <li>I need to be able to perform lookups by unsigned integer keys.</li> <li>The items to be stored are user-defined structs.</li> <li>These indices will be sparse, usually extremely so. Regular arrays are out.</li> <li>The frequency of each index will have a non-uniform distribution, with small indices being much more frequent than large indices.</li> <li>N will usually be small, probably no larger than 5 or 10, but I don't want to rely on that too heavily because it might occasionally be much larger.</li> <li>The constant term matters a lot. I need really fast lookups when N is small. I already tried generic hash tables and, empirically, they are too slow, even when N=1, meaning no collisions, probably because of the amount of indirection involved. However, I'd be open to suggestions about specialized hash tables that take advantage of other constraints mentioned.</li> <li>Insertion time is not important as long as retrieval time is fast. Even O(N) insertion time is good enough.</li> <li>Space efficiency is not terribly important, though it is important enough not to just use regular arrays.</li> </ol>

When N is small a simple array or single linked list with key + value as payload is very efficient. Even if it is not the best when N gets larger. You get O(N) lookup time which means lookups take <code>k * N</code> time. A O(1) lookup takes a constant <code>K</code> time. So you get better performance with O(N) for <code>N < K/k</code>. Here <code>k</code> is very small so you can get to interesting values of N. Remember that the Big O notation only describes behavior for large <code>N</code>s, not what you are after. For small tables <pre class="prettyprint"><code>void *lookup(int key_to_lookup) { int n = 0; while (table_key[n] != key_to_lookup) n++; return table_data[n]; } </code></pre> can be hard to beat. Benchmark your hash tables, balanced tree and simple array/linked list and see at which values of N they each start to be better. Then you will know which is better for you. I almost forgot: keep the frequently accessed keys at the beginning of your array. Given your description that means keep it sorted.

Best Data Structure for The Following Constraints?

Tags:

performance

language-agnostic

optimization

data-structures

Here are some constraints for a data structure I need. It seems like none of the common data structures (I will mention the ones I've thought of below) fit these all that well. Can anyone suggest one that I maybe haven't thought of?

I need to be able to perform lookups by unsigned integer keys.
The items to be stored are user-defined structs.
These indices will be sparse, usually extremely so. Regular arrays are out.
The frequency of each index will have a non-uniform distribution, with small indices being much more frequent than large indices.
N will usually be small, probably no larger than 5 or 10, but I don't want to rely on that too heavily because it might occasionally be much larger.
The constant term matters a lot. I need really fast lookups when N is small. I already tried generic hash tables and, empirically, they are too slow, even when N=1, meaning no collisions, probably because of the amount of indirection involved. However, I'd be open to suggestions about specialized hash tables that take advantage of other constraints mentioned.
Insertion time is not important as long as retrieval time is fast. Even O(N) insertion time is good enough.
Space efficiency is not terribly important, though it is important enough not to just use regular arrays.

457

asked Mar 01 '09 20:03

dsimcha

1 Answers

When N is small a simple array or single linked list with key + value as payload is very efficient. Even if it is not the best when N gets larger.

You get O(N) lookup time which means lookups take k * N time. A O(1) lookup takes a constant K time. So you get better performance with O(N) for N < K/k. Here k is very small so you can get to interesting values of N. Remember that the Big O notation only describes behavior for large Ns, not what you are after. For small tables

void *lookup(int key_to_lookup)
{
  int n = 0;
  while (table_key[n] != key_to_lookup)
    n++;
  return table_data[n];
}

can be hard to beat.

Benchmark your hash tables, balanced tree and simple array/linked list and see at which values of N they each start to be better. Then you will know which is better for you.

I almost forgot: keep the frequently accessed keys at the beginning of your array. Given your description that means keep it sorted.

answered Sep 21 '22 21:09

kmkaplan

Related questions
                            
                                Efficiently calculating weighted distance in MATLAB
                            
                                Slow execution under 64 bits. Possible RyuJIT bug?
                            
                                How to Avoid Conditionals in Loops
                            
                                Using include doesn't change the behavior
                            
                                How to improve performance of JavaFX graphic drawing?
                            
                                Android app first start is very slow and systrace shows 30 seconds of bindApplication
                            
                                How to measure TCP back pressure?
                            
                                Rust slower than Python at parsing files
                            
                                Using try-catch over if conditions to safely set values with minimum performance impact in java
                            
                                Fastest `finally` for C++ [closed]
                            
                                Erlang 'catch' expression vs try/catch in terms of efficiency
                            
                                how to calculate correlation between rows in python pandas data frame
                            
                                Why isn't column-wise operation much faster than row-wise operation (as it should be) for a matrix in R
                            
                                Speed of writing a numpy array to a text file
                            
                                clojure - :refer vs :as
                            
                                Count occurences of lists efficiently
                            
                                Did I/O become slower since Python 2.7?
                            
                                How to keep automated tests fast?
                            
                                What steps do you take to increase performance of a Sharepoint site?
                            
                                what is the fastest way to generate a unique set in .net 2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With