Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: Download zip file and extract

Tags:

ruby

zip

rubyzip

I have a ruby script that downloads a remote ZIP file from a server using rubys opencommand. When I look into the downloaded content, it shows something like this:

PK\x03\x04\x14\x00\b\x00\b\x00\x9B\x84PG\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x10\x00foobar.txtUX\f\x00\x86\v!V\x85\v!V\xF6\x01\x14\x00K\xCB\xCFOJ,RH\x03S\\\x00PK\a\b\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00PK\x01\x02\x15\x03\x14\x00\b\x00\b\x00\x9B\x84PG\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00\n\x00\f\x00\x00\x00\x00\x00\x00\x00\x00@\xA4\x81\x00\x00\x00\x00foobar.txtUX\b\x00\x86\v!V\x85\v!VPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00D\x00\x00\x00T\x00\x00\x00\x00\x00

I tried using the Rubyzip gem (https://github.com/rubyzip/rubyzip) along with its class Zip::ZipInputStream like this:

stream = open("http://localhost:3000/foobar.zip").read # this outputs the zip content from above
zip = Zip::ZipInputStream.new stream

Unfortunately, this throws an error:

 Failure/Error: zip = Zip::ZipInputStream.new stream
 ArgumentError:
   string contains null byte

My questions are:

  1. Is it possible, in general, to download a ZIP file and extract its content in-memory?
  2. Is Rubyzip the right library for it?
  3. If so, how can I extract the content?
like image 491
23tux Avatar asked Oct 16 '15 14:10

23tux


2 Answers

I found the solution myself and then at stackoverflow :D (How to iterate through an in-memory zip file in Ruby)

input = HTTParty.get("http://example.com/somedata.zip").body
Zip::InputStream.open(StringIO.new(input)) do |io|
  while entry = io.get_next_entry
    puts entry.name
    parse_zip_content io.read
  end
end
  1. Download your ZIP file, I'm using HTTParty for this (but you could also use ruby's open command (require 'open-uri').
  2. Convert it into a StringIO stream using StringIO.new(input)
  3. Iterate over every entry inside the ZIP archive using io.get_next_entry (it returns an instance of Entry)
  4. With io.read you get the content, and with entry.name you get the filename.
like image 199
23tux Avatar answered Dec 02 '22 06:12

23tux


Like I commented in https://stackoverflow.com/a/43303222/4196440, we can just use Zip::File.open_buffer:

require 'open-uri'

content = open('http://localhost:3000/foobar.zip')

Zip::File.open_buffer(content) do |zip|
  zip.each do |entry|
    puts entry.name
    # Do whatever you want with the content files.
  end
end
like image 27
Prodis Avatar answered Dec 02 '22 07:12

Prodis