I have a .bz2
file. I want to list the first or last 10 lines without decompress it as it is too big. I tried the head -10
or tail -10
but I see gibberish. I also need to compare two compressed file to check if they are similar or not. how to achieve this without decompressing the files ?
EDIT: Similar means identical (have the same content).
While bzip2
is a block-based compression algorithm, so in theory you could just find the particular blocks you want to decompress, this would be complicated (e.g. what if the last ten lines you ultimately want to see actually spans two or more compressed blocks?).
To answer your immediate question, you can do this, which does actually decompress the entire file, so is in a sense wasteful, but it doesn't try to store that file anywhere, so you don't run into storage capacity issues:
bzcat file.bz2 | head -10
bzcat file.bz2 | tail -10
If your distribution doesn't include bzcat
(which would be a bit unusual in my experience), bzcat
is equivalent to bzip2 -d -c
.
However, if your ultimate goal is to compare two compressed files (that may have been compressed at different levels, and so comparing the actual compressed files directly doesn't work), you can do this (assuming bash
as your shell):
cmp <(bzcat file1.bz2) <(bzcat file2.bz2)
This will decompress both files and compare the uncompressed data byte-by-byte without ever storing either of the decompressed files anywhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With