How do i tell in Perl what the size of a file inside a gzip archive is without unpacking the whole file?

Tags:

I have a bunch of ridiculously big files (multiple gigabytes in size) that do have a really high compression ratio (1:200 or better). I have to process those and would like to at least show some kind of progress estimate. For that reason i'd like to know the size of the file inside the .gz, so i can compare it with what i pulled out already.

However, since unpacking the whole file in advance each time is rather prohibitive and a waste of time, i'd like to figure the size out without doing that.

I know it is possible. I can just open gzip files with Total Commander and the viewer plugin will show me the right size. (I know it's not unpacking because it shows me the size immediately, which wouldn't really be possible with a 10GB file inside the gzip.)

There probably are some header fields that contain that information.

However looking through the docs of various CPAN modules i couldn't find anything that fits the bill. IO::Uncompress::Gunzip lets me get at a header, but it doesn't contain any file size information.

Any suggestions?

553

asked Feb 09 '11 15:02

Mithaldu

2 Answers

Just so there's a proper answer for this:

sub get_gz_size {
    my ( $gz_file ) = @_;
    my @raw = `gzip --list $gz_file`;
    my $size = ( split " ", $raw[1] )[1];
    return $size;
}

128

answered Nov 15 '22 04:11

Mithaldu

As described in the comments above, the last 4 bytes contain the isize

Here's some code I wrote to calculate the uncompressed bytes given a file path:

sub get_isize
{
   my ($file) = @_;

   my $isize_len = 4;

   # create a handle we can seek
   my $FH;
   unless( open( $FH, '<:raw', $file ) )
   {
      die "Failed to open $file: $!";
   }
   my $io;
   my $FD = fileno($FH);
   unless( $io = IO::Handle->new_from_fd( $FD, 'r' ) )
   {
      die "Failed to create new IO::Handle for $FD: $!";
   }

   # seek back from EOF
   unless( $io->IO::Seekable::seek( "-$isize_len", 2 ) ) 
   {
      die "Failed to seek $isize_len from EOF: $!"
   }

   # read from here into mod32_isize
   my $mod32_isize;
   unless( my $bytes_read = $io->read( $mod32_isize, $isize_len ) )
   {
      die "Failed to read $isize_len bytes; read $bytes_read bytes instead: $!";
   }

   # convert mod32 to decimal by unpacking value
   my $dec_isize = unpack( 'V', $mod32_isize );

   return $dec_isize;
}

For uncompressed files larger than 4Gb, you'll need to guess whether to add 4Gb to the isize retrieved, based upon the expected minimum compression factor.

use constant MIN_COMPRESS_FACTOR => 200;
my $outer_bytes = ( -s $path );
my $inner_bytes = get_isize( $path );
$bytes += 4294967296 if( $inner_bytes < $outerbytes * MIN_COMPRESS_FACTOR );

If your uncompressed file is larger than 4294967296 * 2, then you're going to have to guess how many multiples of 4294967296 to apply (although I've never tested this), however you'll need to have an accurate judge of the expected compression ratio for this to work out:

my $estimated_multiplier = int( ($outerbytes * MIN_COMPRESS_FACTOR) / 4294967296 );
$bytes += ( 4294967296 * $estimated_multiplier ) if( $estimated_multiplier );

answered Nov 15 '22 05:11

errant.info

Related questions
                            
                                Route to static file in Mojo
                            
                                Is it safe to update a MySQL table while iterating a resultset with Perl DBI?
                            
                                Basic network chat app in Perl
                            
                                Does C++ have ordered hash?
                            
                                ElasticSearch (search_context_missing_exception) with Search::ElasticSearch::Scroll
                            
                                Perl user input for unix command
                            
                                thrift character encoding, perl to java
                            
                                Why memory leak is not reported when perl is compiled with `DEBUG_LEAKING_SCALARS`?
                            
                                How to run perl test in debugger mode?
                            
                                Why would you ever need (?(R)...|...) if condition in a regex?
                            
                                How can I find multiple motifs(substring) in a protein sequence(string)?
                            
                                How do I override perl's compilation flags when building modules?
                            
                                How do I add relationships at runtime using DBIx::Class and Catalyst?
                            
                                Is pushing a variable onto an array a threadsafe operation?
                            
                                I need help compensating for the shifting of images when trying to create a grid with one image and apply it on another
                            
                                How to localize a variable in an upper scope in Perl?
                            
                                Using Test::MockDBI multiple times with different results
                            
                                How can I extract/parse tabular data from a text file in Perl?
                            
                                XML::Simple output element order from complex hash
                            
                                Method for self-rearranging job queue

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do i tell in Perl what the size of a file inside a gzip archive is without unpacking the whole file?

Tags:

gzip

perl

Mithaldu

People also ask

2 Answers

Mithaldu

errant.info

Recent Activity

Donate For Us