Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unescape special characters correctly from the URL in Rails 3.0.3

I'm using Rails 3.0.3 with REE ( Ruby 1.8.7 ) and gem 'mysql2', '0.2.6'

There's a search feature in my project that enable people to use the GET method using URL or using forms and then generate the URL.

Example:

I want to search:

origin city: "Århus, Denmark" and destination city: "Asunción, Paraguay"

they both have a special character: "Å" and "ó", so the URL will be generated like this when someone click the search button.

?&origin=%C5rhus%2C%20Denmark&destination=Asunci%F3n%2C%20Paraguay

Problem:

When i search that city, it's not unescaped like i want ( i tried using like CGI, URI, even some gems).

When i see at the console, ActiveRecord received the query like this:

Parameters: {"destination"=>"Asunci�n, Paraguay", "origin"=>"�rhus, Denmark", "sort"=>"newest"}
City Load (0.1ms)  SELECT `cities`.* FROM `cities` WHERE (`cities`.`name` = '�rhus') ORDER BY cities.name ASC
City Load (6.8ms)  SELECT `cities`.* FROM `cities` WHERE (`cities`.`name` = 'Asunci�n, Paraguay') ORDER BY cities.name ASC

Conclusion: the cities can't be found :(

But, i found an interesting thing:

  • When i made an error on the file asociated with this function, the output will be like this :

    Request

    Parameters:
    {"destination"=>"Asunción,
    Paraguay",
    "origin"=>"Århus,
    Denmark",
    "sort"=>"newest"}
    

it's a valid one!

Question:

Do you guys have an idea how to solve this? Thanks in advance :)

like image 502
panggi Avatar asked Jan 17 '12 03:01

panggi


1 Answers

You're right, it looks like you have an encoding problem somewhere. The 0xC5 character is "Å" in ISO-8859-1 (AKA Latin-1), in UTF-8 it would be %C3%85 in the URL.

I suspect that you're using JavaScript on the client side and that your JavaScript is using the old escape function to build the URL, escape has some issues with non-ASCII characters. If this is the case, then you should upgrade your JavaScript to use encodeURIComponent instead. Have a look at this little demo and you'll see what I'm talking about:

http://jsfiddle.net/ambiguous/U5A3k/

If you can't change the client-side script then you can do it the hard way in Ruby using force_encoding and encoding:

>> s = CGI.unescape('%C5rhus%2C%20Denmark')
=> "\xC5rhus, Denmark"
>> s.encoding
=> #<Encoding:UTF-8>
>> s.force_encoding('iso-8859-1')
=> "\xC5rhus, Denmark"
>> s.encoding
=> #<Encoding:ISO-8859-1>
>> s.encode!('utf-8')
=> "Århus, Denmark"
>> s.encoding
=> #<Encoding:UTF-8>

You should get something like "\xC5rhus, Denmark" from params and you could unmangle that with:

s = params[:whatever].force_encoding('iso-8859-1').encode('utf-8')

Dealing with this on the server side would be a last resort though, if your client-side code is sending back incorrectly encoded data then you'll be left with a pile of guesswork on the server to figure out what encoding was actually used to get it into the URL.

like image 200
mu is too short Avatar answered Dec 05 '22 04:12

mu is too short