Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does compression increase file size?

I have a large text file which mainly consists of numbers and some delimiters like ,|{}[]: etc. I used Lempel-Ziv encoding for compression. The code I used is not mine and is the one from Rosetta code. I ran the code for line by line compression as well as once for chunk by chunk compression:

def readChunk(file_object, size = 1024):
    while True:
        data = file_object.read(size)
        if not data:
            break
        yield data

def readByChunk():
    with open(LARGE_FILE, 'r') as f:
        for data in readChunk(f, 2048):
            compressed_chunk = compress(data)
            compressed_chunk = map(lambda a : str(a), compressed_chunk)
            comp_file.write(" ".join(compressed_chunk))

def readLineByLine():
    with open(LARGE_FILE, 'r') as f:
        lines = f.readlines()
        for data in lines:
            compressed_line = compress(data)
            compressed_line = map(lambda a : str(a), compressed_line)
            comp_file.write(" ".join(compressed_line))

Both function output a file that is much bigger than the original file!! Decompression works fine i.e. I am able to get the original text back so I think the code is correct.

Am I doing something wrong in saving the file?

like image 877
Animesh Pandey Avatar asked Feb 20 '26 03:02

Animesh Pandey


2 Answers

The compressor you are using is terrible. Try zlib.compress instead.

like image 196
Mark Adler Avatar answered Feb 22 '26 00:02

Mark Adler


The general answer is "when the data is random bits", or already compressed. 99% of other normal things will compress just fine. For ascii data (like the data you say you are using) really trivial compressors suffice, just Huffman encoding it gets you a decent boost and you're saying you only use like a dozen unique characters.

Which means that either you have a bunch of random data that you're not telling us about or there's a bug in the compressor.

like image 28
U2EF1 Avatar answered Feb 22 '26 00:02

U2EF1



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!