Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

visualization of compressed (deflated, gzipped) content structures

I have some ideas I would like to experiment with relating to data compression, but am finding it difficult to decipher some parts of how the standard are applied "in real life". I would like to look at some sample compressed files to observe how the the blocks are arranged and the huffman tree(s) are structured.

Are there any tools in existence which can help visualize this for a given compressed file (zip/gzip/deflate etc)? I'm picturing something like a tree view or some form of graph visualizer.

like image 364
Brady Moritz Avatar asked Aug 06 '10 20:08

Brady Moritz


People also ask

How to read the decompressed data from a gzip file?

importgzipcompressed=open('alice.txt.gz','rb')gzip_file=gzip. GzipFile(fileobj=compressed) In the above code, we save the open file object as compressedbefore giving it over to the GzipFile. That way, as we read the decompressed data out of gzip_file, we’ll be able to use the tell()method to see how far we are through the compressed file.

How do you visualize a compressed and uncompressed file?

So it makes sense that whatever I visualize should include the position in the file along the X axis, and the compressed size along the Y axis. An uncompressed file would simply be a diagonal line.

Is there a tool for visualizing gzip files in Python?

There may be readily available tools for visualizing this, but I didn’t find anything. Since I know gzip is implemented in the Python standard libraries, and I’m familiar with Python plotting libraries, I thought I would try to make my own visualization. This blog post (which is in fact just a Jupyter notebook) is the result.

Why is the plot piecewise in gzip?

The piecewise nature of the plot is just due to how obviously different gzip performs on the two different types of data. The first segment is text data, and so the slope is not very steep. The second segment is random data, which does not compress well, and so the slope is steep. This alternates for all the segments.


2 Answers

You might be interested in this (if you are still interested that is :-P)

http://jvns.ca/blog/2013/10/24/day-16-gzip-plus-poetry-equals-awesome/

like image 80
Ezra Avatar answered Oct 30 '22 23:10

Ezra


I made a "entropy image" tool.

The entropy_image tool replaces each pixel with the (estimated) number of bits necessary to encode that pixel using range coding or Huffman compression.

I hope this isn't the only compression visualization tool in the world.

like image 33
David Cary Avatar answered Oct 30 '22 23:10

David Cary