Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I tail a zipped file without reading its entire contents?

I want to emulate the functionality of gzcat | tail -n.

This would be helpful for times when there are huge files (of a few GB's or so). Can I tail the last few lines of such a file w/o reading it from the beginning? I doubt that this won't be possible since I'd guess for gzip, the encoding would depend on all the previous text.

But still I'd like to hear if anyone has tried doing something similar - maybe investigating over a compression algorithm that could provide such a feature.

like image 275
baskin Avatar asked Jul 25 '09 20:07

baskin


People also ask

What does it mean to extract all in a zip file?

When you extract files from a zipped folder, a new folder with the same name is created which contains the files. The compressed (zipped) version also remains. Right-click the zipped folder saved to your computer. Choose "Extract All..." (an extraction wizard will begin).

How do you squeeze a zip file?

All you have to do is to choose ZIP as an archive format and locate the 'Split to volumes' option. The 'Split to volumes' option allows you to choose an exact size of each archive, which will then start the compression and splitting process.

Do zipped files use less space?

Zipped (compressed) files take up less storage space and can be transferred to other computers more quickly than uncompressed files. In Windows, you work with zipped files and folders in the same way that you work with uncompressed files and folders.

Can a zip file contain itself?

Like the line of shopping carts, it never ends, because it loops back onto itself: the zip file contains itself! And it's probably less work to put together a self-reproducing zip file than to put together all those shopping carts, at least if you're the kind of person who would read this blog.


2 Answers

No, you can't. The zipping algorithm works on streams and adapts its internal codings to what the stream contains to achieve its high compression ratio.

Without knowing what the contents of the stream are before a certain point, it's impossible to know how to go about de-compressing from that point on.

Any algorithm which allows you to de-compress arbitrary parts of it will require multiple passes over the data to compress it.

like image 121
Ben S Avatar answered Sep 18 '22 11:09

Ben S


BGZF is used to created index gzip compressed BAM files created by Samtools. These are randomly accessible.

http://samtools.sourceforge.net/

like image 35
Jeremy Leipzig Avatar answered Sep 22 '22 11:09

Jeremy Leipzig