Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can data be compressed only once?

Tags:

compression

So the compression process takes a chunk of binary data A and outputs a smaller chunk of binary data B. What characteristics of B make it unable to go through this process again?

like image 752
Gordon Gustafson Avatar asked Jul 07 '10 20:07

Gordon Gustafson


2 Answers

Data has something called entropy: the amount of new information each new bit gives. For example, 10101010101010101010 has low entropy because you don't need the next bit to know what comes next. A perfect compression algorithm would compress to maximum entropy, so every bit gives information and so cannot be removed, making the size a minimum.

like image 151
murgatroid99 Avatar answered Jan 04 '23 03:01

murgatroid99


It is not true that data that is already compressed cannot be compressed again. If you take a file consisting of 1 million zeros and compress it using gzip, the resulting compressed file is 1010 bytes. If you compress the compressed file again it is further reduced to just 75 bytes.

$ python
>>> f = open('0.txt', 'w')
>>> f.write('0'*1000000)
>>> f.close()
>>>
$ wc -c 0.txt
1000000 0.txt

$ gzip 0.txt
$ wc -c 0.txt.gz
1010 0.txt.gz

$ mv 0.txt.gz 0.txt
$ gzip 0.txt
$ wc -c 0.txt.gz
75 0.txt.gz

The reason why it is unlikely that compression works twice is because the compression process removes redundancy. When you have less redundancy it is harder to compress the file further.

like image 42
Mark Byers Avatar answered Jan 04 '23 03:01

Mark Byers