Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

to_json not converting special characters to unicode style

I'm having problem with special characters when casting a hash to a json string.

Everything works fine with Ruby 2.0 / Rails 3.2.21, that is,

puts "“".to_json
#"\u201c"

But with Ruby 2.3.0 / Rails 4.2.5.1 I get

puts "“".to_json
#"“"

Is there any way to force Ruby 2.3.0 to convert special characters to unicode style strings (\uXXXX) ?

Remark:

Notice that in Ruby 2.3 / Rails 4, we get

"“".to_json.bytesize == 5 #true

However, in 2.0 we get

"“".to_json.bytesize == 8 #true

So clearly it's the string itself that is different, not different output formats.

like image 315
Ingo Avatar asked Jun 23 '16 09:06

Ingo


1 Answers

I ❤ Rails (just kidding.)

In Rails3 there was a hilarious method to damage UTF-8 in JSON. Rails4, thanks DHH, freed from this drawback.

So, whether one wants the time-back machine, the simplest way is to monkeypatch ::ActiveSupport::JSON::Encoding#escape:

module ::ActiveSupport::JSON::Encoding
  def self.escape(string)
    if string.respond_to?(:force_encoding)
      string = string.encode(::Encoding::UTF_8, :undef => :replace)
                     .force_encoding(::Encoding::BINARY)
    end
    json = string.
            gsub(escape_regex) { |s| ESCAPED_CHARS[s] }.
            gsub(/([\xC0-\xDF][\x80-\xBF]|
                   [\xE0-\xEF][\x80-\xBF]{2}|
                   [\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
            s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
          }
    json = %("#{json}")
    json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
    json
  end
end

More robust solution would be to corrupt the result:

class String
  def rails3_style
    string = encode(::Encoding::UTF_8, :undef => :replace).
               force_encoding(::Encoding::BINARY)
    json = string.
      gsub(/([\xC0-\xDF][\x80-\xBF]|
             [\xE0-\xEF][\x80-\xBF]{2}|
             [\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s| 
      s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
    }   
    json = %("#{json}")
    json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
    json
  end 
end

puts "“".to_json.rails3_style
#⇒ "\u201c"

I hardly could understand why anybody might want to do this on purpose, but the solution is here.

like image 158
Aleksei Matiushkin Avatar answered Sep 24 '22 14:09

Aleksei Matiushkin