Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read only x number of bytes of the body using Net::HTTP?

Tags:

http

ruby

It seems like the methods of Ruby's Net::HTTP are all or nothing when it comes to reading the body of a web page. How can I read, say, the just the first 100 bytes of the body?

I am trying to read from a content server that returns a short error message in the body of the response if the file requested isn't available. I need to read enough of the body to determine whether the file is there. The files are huge, so I don't want to get the whole body just to check if the file is available.

like image 598
bvanderw Avatar asked Sep 17 '08 12:09

bvanderw


People also ask

Which number of byte to be read into a buffer of the system is?

The read() system call reads the input typed by the user via the keyboard (file descriptor 0) and stores it in the buffer (buff) which is nothing but a character array. It will read a maximum of 10 bytes (because of the third parameter). This can be less than or equal to the buffer size.

What is Net HTTP?

net. http is an API for making non-blocking HTTP requests to remote servers.


1 Answers

This is an old thread, but the question of how to read only a portion of a file via HTTP in Ruby is still a mostly unanswered one according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:

require 'net/http'

# provide access to the actual socket
class Net::HTTPResponse
  attr_reader :socket
end

uri = URI("http://www.example.com/path/to/file")
begin
  Net::HTTP.start(uri.host, uri.port) do |http|
    request = Net::HTTP::Get.new(uri.request_uri)
    # calling request with a block prevents body from being read
    http.request(request) do |response|
      # do whatever limited reading you want to do with the socket
      x = response.socket.read(100);
      # be sure to call finish before exiting the block
      http.finish
    end
  end
rescue IOError
  # ignore
end

The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.

FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:

class Net::BufferedIO
  def readchar
    read(1)[0].ord
  end
end
like image 188
Dustin Frazier Avatar answered Sep 23 '22 21:09

Dustin Frazier