I'm trying to parse a URI from user input. I'm assuming some users won't put the scheme in their URI's and I want to default to "http".
The following code doesn't work:
require 'uri'
uri_to_check = URI::parse("www.google.com")
uri_to_check.scheme = "http" unless uri_to_check.scheme
puts uri_to_check.to_s
I expect to see "http://www.google.com" but I get "http:www.google.com". Is it even possible to do it this way?
If so, what am I missing?
Is there a better way to do this?
The leading slashes (//
) indicate that the URL is an IP-based address, and are needed to flag the hostname so URI can parse them correctly.
Wikipedia has some good overviews and examples of use:
http://en.wikipedia.org/wiki/Url , http://en.wikipedia.org/wiki/URI_scheme , http://en.wikipedia.org/wiki/URL_normalization
The best information is in the spec itself: http://www.ietf.org/rfc/rfc1738.txt particularly in section 3.1 "3.1. Common Internet Scheme Syntax".
You might want to consider using the Addressable gem. It's smarter and is what I use when I need to do a lot of URI parsing or manipulation.
http://addressable.rubyforge.org/ and http://addressable.rubyforge.org/api/Addressable/URI.html
When the string you want to be parsed doesn't conatin a scheme, URI
doesn't recognize it
as a hostname:
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> uri = URI::parse("www.google.com")
=> #<URI::Generic:0x11cfc88 URL:www.google.com>
irb(main):003:0> uri.path
=> "www.google.com"
irb(main):004:0> uri.host
=> nil
When you set the scheme as you do in your example and then call to_s
the URI is build without the host...
You can try something like the following: (That's a quick hack, I don't know the URI
details...)
uri = URI::parse("www.google.com")
if uri.scheme.nil? && uri.host.nil?
unless uri.path.nil?
uri.scheme = "http"
uri.host = uri.path
uri.path = ""
end
end
puts uri.to_s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With