Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a workaround to open URLs containing underscores in Ruby?

Tags:

ruby

open-uri

I'm using open-uri to open URLs.

resp = open("http://sub_domain.domain.com")

If it contains underscore I get an error:

URI::InvalidURIError: the scheme http does not accept registry part: sub_domain.domain.com (or bad hostname?)

I understand that this is because according to RFC URLs can contain only letters and numbers. Is there any workaround?

like image 629
Arty Avatar asked Mar 06 '11 05:03

Arty


3 Answers

This looks like a bug in URI, and uri-open, HTTParty and many other gems make use of URI.parse.

Here's a workaround:

require 'net/http'
require 'open-uri'

def hopen(url)
  begin
    open(url)
  rescue URI::InvalidURIError
    host = url.match(".+\:\/\/([^\/]+)")[1]
    path = url.partition(host)[2] || "/"
    Net::HTTP.get host, path
  end
end

resp = hopen("http://dear_raed.blogspot.com/2009_01_01_archive.html")
like image 86
stef Avatar answered Oct 22 '22 13:10

stef


URI has an old-fashioned idea of what an url looks like.

Lately I'm using addressable to get around that:

require 'open-uri'
require 'addressable/uri'

class URI::Parser
  def split url
    a = Addressable::URI::parse url
    [a.scheme, a.userinfo, a.host, a.port, nil, a.path, nil, a.query, a.fragment]
  end
end

resp = open("http://sub_domain.domain.com") # Yay!

Don't forget to gem install addressable

like image 41
pguardiario Avatar answered Oct 22 '22 14:10

pguardiario


This initializer in my rails app seems to make URI.parse work at least:

# config/initializers/uri_underscore.rb
class URI::Generic
  def initialize_with_registry_check(scheme,
                 userinfo, host, port, registry,
                 path, opaque,
                 query,
                 fragment,
                 parser = DEFAULT_PARSER,
                 arg_check = false)
    if %w(http https).include?(scheme) && host.nil? && registry =~ /_/
      initialize_without_registry_check(scheme, userinfo, registry, port, nil, path, opaque, query, fragment, parser, arg_check)
    else
      initialize_without_registry_check(scheme, userinfo, host, port, registry, path, opaque, query, fragment, parser, arg_check)
    end
  end
  alias_method_chain :initialize, :registry_check
end
like image 14
cluesque Avatar answered Oct 22 '22 14:10

cluesque