Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: URI::InvalidURIError (URI must be ascii only

Tags:

ruby

require 'uri'
uri = URI.parse 'http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg'

The browsers have no problem with http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg so I'm asking myself if this ruby class is a little bit outdated? And should I completely renounce it or do some error handling…

like image 728
TNT Avatar asked Oct 20 '17 12:10

TNT


2 Answers

The answer just came to me by asking myself the question:

begin
  uri = URI.parse(url)
rescue URI::InvalidURIError
  uri = URI.parse(URI.escape(url))
end
like image 128
TNT Avatar answered Oct 20 '22 16:10

TNT


With kudus to all the URI.escape answers (also known as URI.encode), these methods have been officially made obsolete by Ruby 2.7 - i.e. they now produce a visible URI.escape is obsolete warning message when you use them - previously they have just been deprecated. In Ruby 3.0 these methods have been completely removed and are no longer available at all - not even with a warning.

Unfortunately, as far as I can tell, the Ruby's standard library URI class does not offer any alternative for handling URIs containing non-ASCII characters, which are all so common these days - <sarcasm>now that the web had gone international</sarcasm>.

The best solution I came up with is using the addressable gem that contains the URI class we deserve - it handles everything the world has to throw at it and you can get an "HTTP safe" URI using the #display_uri method:

Addressable::URI.parse("http://example.com/Оуэн-Мэтьюс.jpg")
=> #<Addressable::URI:0xc8 URI:http://example.com/Оуэн-Мэтьюс.jpg>
Addressable::URI.parse("http://example.com/Оуэн-Мэтьюс.jpg").display_uri.to_s
=> "http://example.com/%D0%9E%D1%83%D1%8D%D0%BD-%D0%9C%D1%8D%D1%82%D1%8C%D1%8E%D1%81.jpg"

Addressable::URI also comes with all kinds of goodies, such as port inferral (you can tell whether the URL originally contained a port specification, or you can not care), and URL canonicalization (given a base URL, take a possibly relative URL and generate an absolute URL).

Here's how to use this with net/http:

response = Net::HTTP.start(url.host, url.inferred_port, 
        :use_ssl => url.scheme == 'https') do |http|
    req = Net::HTTP::Get.new(url.display_uri.request_uri)
end
like image 11
Guss Avatar answered Oct 20 '22 18:10

Guss