What is the most compact way to write 1,000,000 ints (0, 1, 2...) to file using Python without zipping etc? My answer is: 1,000,000 * 3 bytes using struct module, but it seems like interviewer expected another answer...
Edit. Numbers from 1 to 1,000,000 in random order (so transform like 5, 6, 7 -> 5-7 can be applied in rare case). You can use any writing method you know, but the resulting file should have minimum size.
Actually, you can do A LOT better than 2.5MB, since not all orderings are possible. One might argue that beating 5% would involve compression, since one is not storing the sequence itself. Basically, you would want to store the canonical sequence number. 8 numbers from 0-7 in random order normally takes 24 bits (log(8^8)/log(2)
), but with a canonical sequence number it would take 16 bits (log(8!)/log(2)
).
Basically, this involves coming up with an algorithm which can translate any sequence of integers into a giant number. Example of a possible numbering for 8 number sequence would be ordering by value:
01234567 : 0
01234576 : 1
01234657 : 2
01234675 : 3
01234756 : 4
01234765 : 5
...
The cost of this strategy is log(1000000!)/log(2)
(i.e., log_2(1000000!)
).
The standard solution usually costs about log(1000000^1000000)/log(2)
.
You can also squeeze a tiny bit more space by treating 0000 0000 1111 1111
and 1111 1111
differently, but the amount of space saved by doing so is incredibly tiny.
Edit: A quick and dirty calculation indicates this optimization brings the size down to about 2.204MiB.
Due to the pigeonhole principle, I do not believe it is possible to do better than this strategy, regardless of whether you use compression or some other technique.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With