When does compression increase file size?

Question

I have a large text file which mainly consists of numbers and some delimiters like ,|{}[]: etc. I used Lempel-Ziv encoding for compression. The code I used is not mine and is the one from Rosetta code. I ran the code for line by line compression as well as once for chunk by chunk compression:

def readChunk(file_object, size = 1024):
    while True:
        data = file_object.read(size)
        if not data:
            break
        yield data

def readByChunk():
    with open(LARGE_FILE, 'r') as f:
        for data in readChunk(f, 2048):
            compressed_chunk = compress(data)
            compressed_chunk = map(lambda a : str(a), compressed_chunk)
            comp_file.write(" ".join(compressed_chunk))

def readLineByLine():
    with open(LARGE_FILE, 'r') as f:
        lines = f.readlines()
        for data in lines:
            compressed_line = compress(data)
            compressed_line = map(lambda a : str(a), compressed_line)
            comp_file.write(" ".join(compressed_line))

Both function output a file that is much bigger than the original file!! Decompression works fine i.e. I am able to get the original text back so I think the code is correct.

Am I doing something wrong in saving the file?

Mark Adler · Accepted Answer

The compressor you are using is terrible. Try zlib.compress instead.

U2EF1 · Answer

The general answer is "when the data is random bits", or already compressed. 99% of other normal things will compress just fine. For ascii data (like the data you say you are using) really trivial compressors suffice, just Huffman encoding it gets you a decent boost and you're saying you only use like a dozen unique characters.

Which means that either you have a bunch of random data that you're not telling us about or there's a bug in the compressor.

When does compression increase file size?

Tags:

file-io

compression

python-2.7

lzw

Animesh Pandey

2 Answers

Mark Adler

U2EF1

Recent Activity

Donate For Us

When does compression increase file size?

Tags:

file-io

compression

python-2.7

lzw

Animesh Pandey

2 Answers

Mark Adler

U2EF1

Related questions

Recent Activity

Donate For Us