I'm receiving "order update" from stock exchange. Each order id is between 1 and 100 000 000, so I can use 100 million array to store 100 million orders and when update is received I can look-up order from array very fast just accessing it by index arrray[orderId]
. I will spent several gigabytes of memory but this is OK.
Alternatively I can use hashmap, and because at any moment the number of "active" orders is limited (to, very roughly, 100 000), look-up will be pretty fast too, but probaly a little bit slower then array.
The question is - will hashmap be actually slower? Is it reasonably to create 100 millions array?
I need latency and nothing else, I completely don't care about memory, what should I choose?
According to a stackoverflow post, "HashMap uses an array underneath so it can never be faster than using an array correctly".
Hash Tables vs ArraysHash tables tend to be faster when it comes to searching for items. In arrays, you have to loop over all items before you find what you are looking for while in a hash table you go directly to the location of the item.
Searching over a data structure such as an array presents a linear time complexity of O(n). In other words, as the data structure increases in size, the search time increases in a linear fashion. Simply put, using a hash table is faster than searching through an array.
Hashmaps use the hashcode of the key to access directly the bucket where the entry is stored. This is an O(1) access. If more than one element is in that bucket because of the same or similar hashcode, then you have a few more checks, but it's still way faster than iterating through a list and searching for an element.
Whenever considering performance issues, one experiment is worth a thousand expert opinions. Test it!
That said, I'll take a wild stab in the dark: it's likely that if you can convince your OS to keep your multi-gigabyte array resident in physical memory (this isn't necessarily easy - consider looking at the mlock
and munlock
syscalls), you'll have relatively better performance. Any such performance gain you notice (should one exist) will likely be by virtue of bypassing the cost of the hashing function, and avoiding the overheads associated with whichever collision-resolution and memory allocation strategies your hashmap implementation uses.
It's also worth cautioning that many hash table implementations have non-constant complexity for some operations (e.g., separate chaining could degrade to O(n)
in the worst case). Given that you are attempting to optimize for latency, an array with very aggressive signaling to the OS memory manager (e.g., madvise
and mlock
) are likely to result in the closest to constant-latency lookups that you can get on a microprocessor easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With