Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSON encoding wrongly escaped (Rails 3, Ruby 1.9.2)

In my controller, the following works (prints "oké")

puts obj.inspect

But this doesn't (renders "ok\u00e9")

render :json => obj

Apparently the to_json method escapes unicode characters. Is there an option to prevent this?

like image 684
Michiel de Mare Avatar asked Feb 25 '11 23:02

Michiel de Mare


4 Answers

To set the \uXXXX codes back to utf-8:

json_string.gsub!(/\\u([0-9a-z]{4})/) {|s| [$1.to_i(16)].pack("U")}
like image 83
Wouter Vegter Avatar answered Nov 16 '22 10:11

Wouter Vegter


You can prevent it by monkey patching the method mentioned by muu is too short. Put the following into config/initializers/patches.rb (or similar file used for patching stuff) and restart your rails process for the change to take affect.

module ActiveSupport::JSON::Encoding
  class << self
    def escape(string)
      if string.respond_to?(:force_encoding)
        string = string.encode(::Encoding::UTF_8, :undef => :replace).force_encoding(::Encoding::BINARY)
      end
      json = string.gsub(escape_regex) { |s| ESCAPED_CHARS[s] }
      json = %("#{json}")
      json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
      json
    end
  end
end

Be adviced that there's no guarantee that the patch will work with future versions of ActiveSupport. The version used when writing this post is 3.1.3.

like image 21
David Backeus Avatar answered Nov 16 '22 10:11

David Backeus


If you dig through the source you'll eventually come to ActiveSupport::JSON::Encoding and the escape method:

def escape(string)
  if string.respond_to?(:force_encoding)
    string = string.encode(::Encoding::UTF_8, :undef => :replace).force_encoding(::Encoding::BINARY)
  end
  json = string.
    gsub(escape_regex) { |s| ESCAPED_CHARS[s] }.
    gsub(/([\xC0-\xDF][\x80-\xBF]|
           [\xE0-\xEF][\x80-\xBF]{2}|
           [\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
    s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
  }
  json = %("#{json}")
  json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
  json
end

The various gsub calls are forcing non-ASCII UTF-8 to the \uXXXX notation that you're seeing. Hex encoded UTF-8 should be acceptable to anything that processes JSON but you could always post-process the JSON (or monkey patch in a modified JSON escaper) to convert the \uXXXX notation to raw UTF-8 if necessary.

I'd agree that forcing JSON to be 7bit-clean is a bit bogus but there you go.

Short answer: no.

like image 15
mu is too short Avatar answered Nov 16 '22 10:11

mu is too short


Characters were not escaped to unicode with the other methods in Rails2.3.11/Ruby1.8 so I used the following:

render :json => JSON::dump(obj)
like image 13
oldergod Avatar answered Nov 16 '22 11:11

oldergod