Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream the response body of an HTTP GET to an HTTP POST with Ruby

Tags:

http

ruby

I'm trying to download a large file and then post that file to a REST endpoint using Ruby. The file could be very large, i.e., more than could be stored in memory or even in a temp file on disk. I've been trying this with Net::HTTP, but I'm open to solutions with any other library (rest-client, etc) as long as they do what I'm trying to do.

Here's what I tried:

require 'net/http'

source_uri = URI("https://example.org/very_large_file")
source_request = Net::HTTP::Get.new(source_uri)
source_http = Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https')

target_uri = URI("https://example2.org/rest/resource")
target_request = Net::HTTP::Post.new(target_uri)
target_http = Net::HTTP.start(target_uri.host, target_uri.port, use_ssl: target_uri.scheme == 'https')

source_response = source_http.request(source_request)
target_request.body = source_response.read_body
target_request.content_type = 'multipart/form-data'
target_response = target_http.request(target_request)

What I want to happen is for source_response.read_body to return a stream, which I can then pass to the target_request in chunks.

like image 357
Bill Ingram Avatar asked Feb 19 '16 00:02

Bill Ingram


3 Answers

Answering my own question: here's my solution. Note that in order to make this work, I needed to monkey patch Net::HTTP so I could access the socket in order to manually read chunks from the response object. If you have a better solution, I'd still like to see it.

require 'net/http'
require 'excon'

# provide access to the actual socket
class Net::HTTPResponse
  attr_reader :socket
end

source_uri = URI("https://example.org/very_large_file")
target_uri = URI("https://example2.org/rest/resource")

Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https') do |http|
  request = Net::HTTP::Get.new source_uri

  http.request request do |response|
    len = response.content_length
    p "reading #{len} bytes..."
    read_bytes = 0
    chunk = ''

    chunker = lambda do
      begin
        if read_bytes + Excon::CHUNK_SIZE < len
          chunk = response.socket.read(Excon::CHUNK_SIZE).to_s
          read_bytes += chunk.size
        else
          chunk = response.socket.read(len - read_bytes)
          read_bytes += chunk.size
        end
      rescue EOFError
        # ignore eof
      end
      p "read #{read_bytes} bytes"
      chunk
    end

    Excon.ssl_verify_peer = false
    Excon.post(target_uri.to_s, :request_block => chunker)

  end
end
like image 79
Bill Ingram Avatar answered Sep 17 '22 12:09

Bill Ingram


By using excon and rest-client gem you should be able to stream data and upload it in multi parts.

Unfortunately I could not find a way to stream data with rest-client or post-data using multipart/form-data with excon so you will have to combine the two.

Here's the entire snippet that should work hopefully.

require 'excon'
require 'rest-client'

streamer = lambda do |chunk, remaining_bytes, total_bytes|
  puts "Remaining: #{remaining_bytes.to_f / total_bytes}%"
  puts RestClient.post('http://posttestserver.com/post.php', :param1 => chunk)
end

Excon.get('http://textfiles.com/computers/ami-chts.txt', :response_block => streamer)

After messing around I can get get the following code working somewhat (it doesn't appear to be consistent, sometimes it sends it all and sometimes it doesn't. I believe it's probably because it's ending the http post request before it has finished)

require 'excon'
require 'uri'
require 'net/http'

class Producer
  def initialize
   @mutex = Mutex.new
   @body = ''
  end

  def read(size, out=nil)
    length = nil

    @mutex.synchronize {
      length = @body.slice!(0,size)
    }

    return nil if length.nil? || length.empty?
    out << length if out

    length
  end

  def produce(str)
    @mutex.synchronize {
      @body << str
    }
  end
end

@stream = Producer.new

uri = URI("yourpostaddresshere")
conn = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new uri.request_uri, {'Transfer-Encoding' => 'chunked', 'content-type' => 'text/plain'}
request.body_stream = @stream

Thread.new {
  streamer = lambda do |chunk, remaining_bytes, total_bytes|
    @stream.produce(chunk) 
  end

  Excon.get('http://textfiles.com/computers/ami-chts.txt', :response_block => streamer)
}

conn.start do |http|
  http.request(request)
end

Credits to Roman, I did modify it slightly as HTTP.start requires two arguments (Ruby Net:HTTP change).

like image 23
Nabeel Avatar answered Sep 20 '22 12:09

Nabeel


Without async I/O (which is awkward in Ruby), the only way is by using two threads via a FIFO pipe. One to fetch, the other to upload.

The FIFO works by being a ring buffer. You get back a reader and a writer. Whenever you write to the writer, the reader gets the data, and the reader will always block until there is data available. FIFOs are backed by real file handles, so the I/O is exactly like a file (unlike "fake" streams like StringIO).

Something like this:

require 'net/http'

def download_and_upload(source_url, dest_url)
  rd, wr = IO.pipe
  begin
    source_uri = URI.parse(source_url)

    Thread.start do
      begin
        Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https') do |http|
          req = Net::HTTP::Get.new(source_uri.request_uri)
          http.request(req) do |resp|
            resp.read_body do |chunk|
              wr.write(chunk)
              wr.flush
            end
          end
        end
      rescue IOError
        # Usually because the writer was closed
      ensure
        wr.close rescue nil
      end
    end

    dest_uri = URI.parse(dest_url)

    Net::HTTP.start(dest_uri.host, dest_uri.port, use_ssl: dest_uri.scheme == 'https') do |http|
      req = Net::HTTP::Post.new(dest_uri.request_uri)
      req.body_stream = rd
      http.request(req)
    end
  ensure
    rd.close rescue nil
    wr.close rescue nil
  end
end

I've not tested this since I don't have an endpoint at the moment, but this is the principle of it.

Note that I've left out error handling. If the downloader thread fails, you'll need to catch the error and signal it to the uploader thread. (If the uploader fails, the download will stop because the write pipe will close.)

like image 31
Alexander Staubo Avatar answered Sep 20 '22 12:09

Alexander Staubo