Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I was asked this in a recent interview

I was asked to stay away from HashMap or any sort of Hashing.

The question went something like this -

Lets say you have PRODUCT IDs of up to 20 decimals, along with Product Descriptions. Without using Maps or any sort of hashing function, what's the best/most efficient way to store/retrieve these product IDs along with their descriptions?

Why is using Maps a bad idea for such a scenario?

What changes would you make to sell your solution to Amazon?

like image 490
maximus Avatar asked Nov 27 '22 23:11

maximus


2 Answers

A map is good to use when insert/remove/lookup operations are interleaved. Every operations are amortized in O(log n).

In your exemple you are only doing search operation. You may consider that any database update (inserting/removing a product) won't happen so much time. Therefore probably the interviewer want you to get the best data structure for lookup operations.

In this case I can see only some as already proposed in other answers:

  • Sorted array (doing a binary search)
  • Hasmap
  • trie

With a trie , if product ids do not share a common prefix, there is good chance to find the product description only looking at the first character of the prefix (or only the very first characters). For instance, let's take that product id list , with 125 products:

  • "1"
  • "2"
  • "3"
    ...
  • "123"
  • "124"
  • "1234567"

Let's assume you are looking for the product id titled "1234567" in your trie, only looking to the first letters: "1" then "2" then "3" then "4" will lead to the good product description. No need to read the remaining of the product id as there is no other possibilities. Considering the product id length as n , your lookup will be in O(n). But as in the exemple explained it above it could be even faster to retreive the product description. As the procduct ID is limited in size (20 characters) the trie height will be limited to 20 levels. That actually means you can consider the look up operations will never goes beyond a constant time, as your search will never goes beyong the trie height => O(1). While any BST lookups are at best amortized O(log N), N being the number of items in your tree .

While an hashmap could lead you to slower lookup as you'll need to compute an index with an hash function that is probably implemented reading the whole product id length. Plus browsing a list in case of collision with other product ids.

Doing a binary search on a sorted array, and performance in lookup operations will depends on the number of items in your database.

like image 69
yves Baumes Avatar answered Dec 10 '22 09:12

yves Baumes


A B-Tree in my opinion. Does that still count as a Map?

Mostly because you can have many items loaded at once in memory. Searching these items in memory is very fast.

like image 42
Edison Gustavo Muenz Avatar answered Dec 10 '22 08:12

Edison Gustavo Muenz