Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list the first or last 10 lines from a file without decompressing it in linux [closed]

Tags:

linux

I have a .bz2 file. I want to list the first or last 10 lines without decompress it as it is too big. I tried the head -10 or tail -10 but I see gibberish. I also need to compare two compressed file to check if they are similar or not. how to achieve this without decompressing the files ?

EDIT: Similar means identical (have the same content).

like image 905
Wiliam A Avatar asked Dec 26 '22 09:12

Wiliam A


1 Answers

While bzip2 is a block-based compression algorithm, so in theory you could just find the particular blocks you want to decompress, this would be complicated (e.g. what if the last ten lines you ultimately want to see actually spans two or more compressed blocks?).

To answer your immediate question, you can do this, which does actually decompress the entire file, so is in a sense wasteful, but it doesn't try to store that file anywhere, so you don't run into storage capacity issues:

bzcat file.bz2 | head -10
bzcat file.bz2 | tail -10

If your distribution doesn't include bzcat (which would be a bit unusual in my experience), bzcat is equivalent to bzip2 -d -c.

However, if your ultimate goal is to compare two compressed files (that may have been compressed at different levels, and so comparing the actual compressed files directly doesn't work), you can do this (assuming bash as your shell):

cmp <(bzcat file1.bz2) <(bzcat file2.bz2)

This will decompress both files and compare the uncompressed data byte-by-byte without ever storing either of the decompressed files anywhere.

like image 194
twalberg Avatar answered May 06 '23 16:05

twalberg