I'm trying to download a large file and then post that file to a REST endpoint using Ruby. The file could be very large, i.e., more than could be stored in memory or even in a temp file on disk. I've been trying this with Net::HTTP, but I'm open to solutions with any other library (rest-client, etc) as long as they do what I'm trying to do.
Here's what I tried:
require 'net/http'
source_uri = URI("https://example.org/very_large_file")
source_request = Net::HTTP::Get.new(source_uri)
source_http = Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https')
target_uri = URI("https://example2.org/rest/resource")
target_request = Net::HTTP::Post.new(target_uri)
target_http = Net::HTTP.start(target_uri.host, target_uri.port, use_ssl: target_uri.scheme == 'https')
source_response = source_http.request(source_request)
target_request.body = source_response.read_body
target_request.content_type = 'multipart/form-data'
target_response = target_http.request(target_request)
What I want to happen is for source_response.read_body to return a stream, which I can then pass to the target_request in chunks.
Answering my own question: here's my solution. Note that in order to make this work, I needed to monkey patch Net::HTTP so I could access the socket in order to manually read chunks from the response object. If you have a better solution, I'd still like to see it.
require 'net/http'
require 'excon'
# provide access to the actual socket
class Net::HTTPResponse
attr_reader :socket
end
source_uri = URI("https://example.org/very_large_file")
target_uri = URI("https://example2.org/rest/resource")
Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https') do |http|
request = Net::HTTP::Get.new source_uri
http.request request do |response|
len = response.content_length
p "reading #{len} bytes..."
read_bytes = 0
chunk = ''
chunker = lambda do
begin
if read_bytes + Excon::CHUNK_SIZE < len
chunk = response.socket.read(Excon::CHUNK_SIZE).to_s
read_bytes += chunk.size
else
chunk = response.socket.read(len - read_bytes)
read_bytes += chunk.size
end
rescue EOFError
# ignore eof
end
p "read #{read_bytes} bytes"
chunk
end
Excon.ssl_verify_peer = false
Excon.post(target_uri.to_s, :request_block => chunker)
end
end
By using excon
and rest-client
gem you should be able to stream data and upload it in multi parts.
Unfortunately I could not find a way to stream data with rest-client
or post-data using multipart/form-data with excon
so you will have to combine the two.
Here's the entire snippet that should work hopefully.
require 'excon'
require 'rest-client'
streamer = lambda do |chunk, remaining_bytes, total_bytes|
puts "Remaining: #{remaining_bytes.to_f / total_bytes}%"
puts RestClient.post('http://posttestserver.com/post.php', :param1 => chunk)
end
Excon.get('http://textfiles.com/computers/ami-chts.txt', :response_block => streamer)
After messing around I can get get the following code working somewhat (it doesn't appear to be consistent, sometimes it sends it all and sometimes it doesn't. I believe it's probably because it's ending the http post request before it has finished)
require 'excon'
require 'uri'
require 'net/http'
class Producer
def initialize
@mutex = Mutex.new
@body = ''
end
def read(size, out=nil)
length = nil
@mutex.synchronize {
length = @body.slice!(0,size)
}
return nil if length.nil? || length.empty?
out << length if out
length
end
def produce(str)
@mutex.synchronize {
@body << str
}
end
end
@stream = Producer.new
uri = URI("yourpostaddresshere")
conn = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new uri.request_uri, {'Transfer-Encoding' => 'chunked', 'content-type' => 'text/plain'}
request.body_stream = @stream
Thread.new {
streamer = lambda do |chunk, remaining_bytes, total_bytes|
@stream.produce(chunk)
end
Excon.get('http://textfiles.com/computers/ami-chts.txt', :response_block => streamer)
}
conn.start do |http|
http.request(request)
end
Credits to Roman, I did modify it slightly as HTTP.start requires two arguments (Ruby Net:HTTP change).
Without async I/O (which is awkward in Ruby), the only way is by using two threads via a FIFO pipe. One to fetch, the other to upload.
The FIFO works by being a ring buffer. You get back a reader and a writer. Whenever you write to the writer, the reader gets the data, and the reader will always block until there is data available. FIFOs are backed by real file handles, so the I/O is exactly like a file (unlike "fake" streams like StringIO
).
Something like this:
require 'net/http'
def download_and_upload(source_url, dest_url)
rd, wr = IO.pipe
begin
source_uri = URI.parse(source_url)
Thread.start do
begin
Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https') do |http|
req = Net::HTTP::Get.new(source_uri.request_uri)
http.request(req) do |resp|
resp.read_body do |chunk|
wr.write(chunk)
wr.flush
end
end
end
rescue IOError
# Usually because the writer was closed
ensure
wr.close rescue nil
end
end
dest_uri = URI.parse(dest_url)
Net::HTTP.start(dest_uri.host, dest_uri.port, use_ssl: dest_uri.scheme == 'https') do |http|
req = Net::HTTP::Post.new(dest_uri.request_uri)
req.body_stream = rd
http.request(req)
end
ensure
rd.close rescue nil
wr.close rescue nil
end
end
I've not tested this since I don't have an endpoint at the moment, but this is the principle of it.
Note that I've left out error handling. If the downloader thread fails, you'll need to catch the error and signal it to the uploader thread. (If the uploader fails, the download will stop because the write pipe will close.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With