Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing 1 million phone numbers [closed]

What is the most efficient way, memory-wise, to store 1 million phone numbers?

Apparently this is an interview question at Google, please give your ideas.

like image 341
algo-geeks Avatar asked Mar 10 '11 16:03

algo-geeks


People also ask

What is the most efficient way to store 1 million phone numbers memory wise?

The answer is a bitarray or bit vector.

How will you store a series of mobile nos in memory efficiently?

You store the numbers without separators to save space, one suffix after the other (eg. number 1 is suffixes+0, number 2 is suffixes+1) and you can search efficiently too, if you have it sorted.

Are we close to running out of phone numbers?

US phone numbers are laid out as area code (3 digit), prefix (3 digit), suffix (4 digit) for a total of 10 digits not including country code (+1). While it may seem we have plenty of phone numbers to go around, we really don't. Each suffix can only accommodate for 10,000 assignable numbers (0000 through 9999).


2 Answers

If memory is our biggest consideration, then we don't need to store the number at all, just the delta between i and i+1.

Now if numbers range from 200 0000 - 999 9999, then there are 7,999,999 possible phone numbers. Since we have 1-million numbers, and if we assume that they are uniformly distributed, we have an Expected distance of E = n_i+1 - n_i ~ 8 (3 bits) between sequential numbers n_i and n_i+1. So for a 32-bit int we could potentially store up to 10 sequential offsets (~400kb optimal total memory footprint), however its likely that we'll have some cases where we need an offset greater than 8 (Maybe we have 400, or 1500??). In which case, we can simply reserve the first 2 bits of the int as a header which tells us what frame size we use to read the bits it stores. For example, maybe we use: 00 = 3x10, 01 = 5x6, 10 = 7x4, 11 = 1*30.

like image 121
Rob Leclerc Avatar answered Sep 28 '22 01:09

Rob Leclerc


Write them in ASCII, space-separated.

Zip the resulting string, using your favourite compression algorithm. If order isn't important, sorting them first might help the compression, gives you more repetition closer together.

Oh, did you want efficient random access? Then you should've said.

like image 43
Steve Jessop Avatar answered Sep 28 '22 01:09

Steve Jessop