Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails: on-the-fly streaming of output in zip format?

I need to serve some data from my database in a zip file, streaming it on the fly such that:

  • I do not write a temporary file to disk
  • I do not compose the whole file in RAM

I know that I can do streaming generation of zip files to the filesystemk using ZipOutputStream as here. I also know that I can do streaming output from a rails controller by setting response_body to a Proc as here. What I need (I think) is a way of plugging those two things together. Can I make rails serve a response from a ZipOutputStream? Can I get ZipOutputStream give me incremental chunks of data that I can feed into my response_body Proc? Or is there another way?

like image 666
kdt Avatar asked Jan 25 '11 18:01

kdt


1 Answers

Short Version

https://github.com/fringd/zipline

Long Version

so jo5h's answer didn't work for me in rails 3.1.1

i found a youtube video that helped, though.

http://www.youtube.com/watch?v=K0XvnspdPsc

the crux of it is creating an object that responds to each... this is what i did:

  class ZipGenerator                                                                    
    def initialize(model)                                                               
      @model = model                                                                    
    end                                                                                 
                                                                                        
    def each( &block )                                                                  
      output = Object.new                                                               
      output.define_singleton_method :tell, Proc.new { 0 }                              
      output.define_singleton_method :pos=, Proc.new { |x| 0 }                          
      output.define_singleton_method :<<, Proc.new { |x| block.call(x) }                
      output.define_singleton_method :close, Proc.new { nil }                           
      Zip::IoZip.open(output) do |zip|                                                  
        @model.attachments.all.each do |attachment|                                     
          zip.put_next_entry "#{attachment.name}.pdf"                                   
          file = attachment.file.file.send :file                                        
          file = File.open(file) if file.is_a? String                                   
          while buffer = file.read(2048)                                                
            zip << buffer                                                               
          end                                                                           
        end                                                                             
      end                                                                               
      sleep 10                                                                          
    end                                                                                 
                                                                                        
  end
                                                                                  
  def getzip                                                                            
    self.response_body = ZipGenerator.new(@model)                                       
                                                                                        
    #this is a hack to preven middleware from buffering                                 
    headers['Last-Modified'] = Time.now.to_s                                            
  end                                                                                   

EDIT:

the above solution didn't ACTUALLY work... the problem is that rubyzip needs to jump around the file to rewrite the headers for entries as it goes. particularly it needs to write the compressed size BEFORE it writes the data. this is just not possible in a truly streaming situation... so ultimately this task may be impossible. there is a chance that it might be possible to buffer a whole file at a time, but this seemed less worth it. ultimately i just wrote to a tmp file... on heroku i can write to Rails.root/tmp less instant feedback, and not ideal, but neccessary.

ANOTHER EDIT:

i got another idea recently... we COULD know the compressed size of the files if we do not compress them. the plan goes something like this:

subclass the ZipStreamOutput class as follows:

  • always use the "stored" compression method, in other words do not compress
  • ensure we never seek backwards to change file headers, get it all right up front
  • rewrite any code related to TOC that seeks

I haven't tried to implement this yet, but will report back if there's any success.

OK ONE LAST EDIT:

In the zip standard: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers

they mention that there's a bit you can flip to put the size, compressed size and crc AFTER a file. so my new plan was to subclass zipoutput stream so that it

  • sets this flag
  • writes sizes and CRCs after the data
  • never rewinds output

furthermore i needed to get all the hacks in order to stream output in rails fixed up...

anyways it all worked!

here's a gem!

https://github.com/fringd/zipline

like image 71
fringd Avatar answered Sep 22 '22 17:09

fringd