What is the most efficient way, memory-wise, to store 1 million phone numbers?
Apparently this is an interview question at Google, please give your ideas.
The answer is a bitarray or bit vector.
You store the numbers without separators to save space, one suffix after the other (eg. number 1 is suffixes+0, number 2 is suffixes+1) and you can search efficiently too, if you have it sorted.
US phone numbers are laid out as area code (3 digit), prefix (3 digit), suffix (4 digit) for a total of 10 digits not including country code (+1). While it may seem we have plenty of phone numbers to go around, we really don't. Each suffix can only accommodate for 10,000 assignable numbers (0000 through 9999).
If memory is our biggest consideration, then we don't need to store the number at all, just the delta between i and i+1.
Now if numbers range from 200 0000 - 999 9999, then there are 7,999,999 possible phone numbers. Since we have 1-million numbers, and if we assume that they are uniformly distributed, we have an Expected distance of E = n_i+1 - n_i ~ 8 (3 bits) between sequential numbers n_i and n_i+1. So for a 32-bit int we could potentially store up to 10 sequential offsets (~400kb optimal total memory footprint), however its likely that we'll have some cases where we need an offset greater than 8 (Maybe we have 400, or 1500??). In which case, we can simply reserve the first 2 bits of the int as a header which tells us what frame size we use to read the bits it stores. For example, maybe we use: 00 = 3x10, 01 = 5x6, 10 = 7x4, 11 = 1*30.
Write them in ASCII, space-separated.
Zip the resulting string, using your favourite compression algorithm. If order isn't important, sorting them first might help the compression, gives you more repetition closer together.
Oh, did you want efficient random access? Then you should've said.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With