Why can data be compressed only once?

Question

So the compression process takes a chunk of binary data A and outputs a smaller chunk of binary data B. What characteristics of B make it unable to go through this process again?

murgatroid99 · Accepted Answer

Data has something called entropy: the amount of new information each new bit gives. For example, 10101010101010101010 has low entropy because you don't need the next bit to know what comes next. A perfect compression algorithm would compress to maximum entropy, so every bit gives information and so cannot be removed, making the size a minimum.

Mark Byers · Answer

It is not true that data that is already compressed cannot be compressed again. If you take a file consisting of 1 million zeros and compress it using gzip, the resulting compressed file is 1010 bytes. If you compress the compressed file again it is further reduced to just 75 bytes.

$ python
>>> f = open('0.txt', 'w')
>>> f.write('0'*1000000)
>>> f.close()
>>>
$ wc -c 0.txt
1000000 0.txt

$ gzip 0.txt
$ wc -c 0.txt.gz
1010 0.txt.gz

$ mv 0.txt.gz 0.txt
$ gzip 0.txt
$ wc -c 0.txt.gz
75 0.txt.gz

The reason why it is unlikely that compression works twice is because the compression process removes redundancy. When you have less redundancy it is harder to compress the file further.

Why can data be compressed only once?

Tags:

compression

Gordon Gustafson

2 Answers

murgatroid99

Mark Byers

Recent Activity

Donate For Us

Why can data be compressed only once?

Tags:

compression

Gordon Gustafson

2 Answers

murgatroid99

Mark Byers

Related questions

Recent Activity

Donate For Us