Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Write 1,000,000 ints to file

Tags:

python

What is the most compact way to write 1,000,000 ints (0, 1, 2...) to file using Python without zipping etc? My answer is: 1,000,000 * 3 bytes using struct module, but it seems like interviewer expected another answer...

Edit. Numbers from 1 to 1,000,000 in random order (so transform like 5, 6, 7 -> 5-7 can be applied in rare case). You can use any writing method you know, but the resulting file should have minimum size.

like image 735
Vladimir Mihailenco Avatar asked Jan 21 '23 03:01

Vladimir Mihailenco


1 Answers

Actually, you can do A LOT better than 2.5MB, since not all orderings are possible. One might argue that beating 5% would involve compression, since one is not storing the sequence itself. Basically, you would want to store the canonical sequence number. 8 numbers from 0-7 in random order normally takes 24 bits (log(8^8)/log(2)), but with a canonical sequence number it would take 16 bits (log(8!)/log(2)).

Basically, this involves coming up with an algorithm which can translate any sequence of integers into a giant number. Example of a possible numbering for 8 number sequence would be ordering by value:

01234567 : 0  
01234576 : 1  
01234657 : 2  
01234675 : 3  
01234756 : 4  
01234765 : 5  
...

The cost of this strategy is log(1000000!)/log(2) (i.e., log_2(1000000!)).
The standard solution usually costs about log(1000000^1000000)/log(2) .

You can also squeeze a tiny bit more space by treating 0000 0000 1111 1111 and 1111 1111 differently, but the amount of space saved by doing so is incredibly tiny.

Edit: A quick and dirty calculation indicates this optimization brings the size down to about 2.204MiB.

Due to the pigeonhole principle, I do not believe it is possible to do better than this strategy, regardless of whether you use compression or some other technique.

like image 182
Brian Avatar answered Feb 09 '23 04:02

Brian