Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Does ruby open-uri HTTP Streaming throttle the download or save to a temp file?




I have a large CSV file on a server I'd like to download and process in chunks, without reading the whole thing into memory. After a bit of finagling I've come up with this:

require open-uri

open("http://example.com/#{LARGE_CSV_FILE}") do |file|
  file.each_slice(50_000) do |fifty_thousand_lines|
    MyModel.import fifty_thousand_lines.join

My understanding is that open-uri's #open will wrap the HTTP GET and return an IO-like enumerable object. #each_slice(n) will pass the block an array of n lines at a time. I then join and process those lines.

This imports just fine, and watching my OS X iStat menu, it looks like the memory usage of the ruby process doesn't get out of hand. However, it looks like I downloaded all of the file at once. How can this be without the memory usage exploding?

Does ruby download it to a temporary file and then read it from disk line by line? I would have thought open-uri would instead throttle the HTTP connection and only download more data when its block has finished processing its batch of data.

Is this the right way of downloading and processing a file without loading it all into memory?

like image 478
Gabe Durazo Avatar asked Oct 09 '13 18:10

Gabe Durazo

1 Answers

Yes, it does download to a tempfile. This is easily observed from the console:

2.0.0-p247 :001 > require 'open-uri'
 => true
2.0.0-p247 :002 > f = open("http://stackoverflow.com/questions/19279715/does-ruby-open-uri-http-streaming-throttle-the-download-or-save-to-a-temp-file")
 => #<Tempfile:/tmp/open-uri20140220-27172-1kcjwk2>
like image 130
Chris Heald Avatar answered Oct 20 '22 06:10

Chris Heald