Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zlib in Ruby to uncompress .gz

Tags:

ruby

rubygems

I have a .gz file that contains an XML document. Does anyone know how to use Zlib properly? So far, I have the following code:

require 'zlib'
Zlib::GzipReader.open('PRIDE_Exp_Complete_Ac_1015.xml.gz') { |gz|
    g = File.new("PRIDE_Exp_Complete_Ac_1015.xml", "w")
      g.write(gz)
      g.close()
}

But this creates a blank .xml document. Does anyone know how I can properly do this?

like image 428
Bobby Avatar asked Jul 02 '10 22:07

Bobby


2 Answers

Zlib::GzipReader works like most IO-like classes do in Ruby. You have an open call, and when you pass a block to it, the block will receive the IO-like object. Think of it is convenient way of doing something with a file or resource for the duration of the block.

But that means that in your example gz is an IO-like object, and not actually the contents of the gzip file, as you expect. You still need to read from it to get to that. The simplest fix would then be:

g.write(gz.read)

Note that this will read the entire contents of the uncompressed gzip into memory.

If all you're really doing is copying from one file to another, you can use the more efficient IO.copy_stream method. Your example might then look like:

Zlib::GzipReader.open('PRIDE_Exp_Complete_Ac_1015.xml.gz') do | input_stream |
  File.open("PRIDE_Exp_Complete_Ac_1015.xml", "w") do |output_stream|
    IO.copy_stream(input_stream, output_stream)
  end
end

Behind the scenes, this will try to use the sendfile syscall available in some specific situations on Linux. Otherwise, it will do the copying in fast C code 16KB blocks at a time. This I learned from the Ruby 1.9.1 source code.

like image 118
Stéphan Kochen Avatar answered Oct 21 '22 17:10

Stéphan Kochen


Here is a Ruby one-liner ( cd .git/ first and identify path to any object ):

ruby -rzlib -e 'print Zlib::Inflate.new.inflate(STDIN.read)' < ./74/c757240ec596063af8cd273ebd9f67073e1208
like image 35
tuxdna Avatar answered Oct 21 '22 18:10

tuxdna