Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash table vs. Sorted Array - which to use?

Suppose I have a set of data (unsorted) that I want to store for quick lookup. I don't know what the size is before I load the data and I should load it all at once so I can start performing lookups right away.

In addition, at any time during program execution more data may be presented to me to be stored in the data structure I choose.

Should I use a hash table or sorted array to store this data? Obviously a static hash table would require being made at runtime according to the size of data presented - would this be enough of a disadvantage that I should simply sort the data given to me, even though it would be O(NlogN) instead of O(N)? Or should I be considering some method of dynamic hashing?

Clarification: I need to load data of arbitrary size and then perform search and insertions on the data, with no clear order or idea of the amount of searching/inserting I'll have to do.

I know that's really general...but what if I have to do more insertions after loading the data than searches? What about more searches than insertions?

like image 228
riggspc Avatar asked Dec 08 '22 16:12

riggspc


1 Answers

This really depends on the frequency of the operations.

  • If you do a lot of insertions relative to the number of lookups, then a sorted array is probably not a good option because inserting into a sorted array is expensive (O(n) time). A binary search tree or hash table might be appropriate here.

  • If you do a huge number of lookups relative to the number of insertions, then a sorted array might be a good idea, though a hash table is likely to be faster. Sorted arrays are usually a good choice when you need the data to be in sorted order to do operations like range searches or nearest-neighbor lookups, but if you don't need to do that, it's probably not appropriate.

  • If your keys are of certain types (integers, strings, etc.) you might be able to use a more specific data structure like a trie or van Emde Boas tree to get extra performance. These are sometimes better choices than hash tables or sorted arrays because they can take advantage of the specifics of your data.

If you honestly don't know what's going to happen, I would use a hash table as an initial implementation. It's unlikely to be a bad choice, though there might be a more fine-tuned data structure you could use instead. The sorted array is unlikely to be a good idea if you don't know the usage pattern in advance.

Hope this helps!

like image 63
templatetypedef Avatar answered Dec 11 '22 09:12

templatetypedef