I'm trying to parse some JSON containing escaped unicode characters using JSON.parse
. But on one machine, using json/ext
, it gives back incorrect values. For example, \u2030
should return E2 80 B0
in UTF-8, but instead I'm getting 01 00 00
. It fails with either the escaped "\\u2030"
or the unescaped "\u2030"
.
1.9.2p180 :001 > require 'json/ext'
=> true
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
=> {"f"=>"\u0001\u0000\u0000"}
1.9.2p180 :003 > s["f"].encoding
=> #<Encoding:UTF-8>
1.9.2p180 :004 > s["f"].valid_encoding?
=> true
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
=> [1, 0, 0]
It works on my other machine with the same version of ruby and similar environment variables. The Gemfile.lock on both machines is identical, including json (= 1.6.3)
. It does work with json/pure
on both machines.
1.9.2p180 :001 > require 'json/pure'
=> true
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
=> {"f"=>"‰"}
1.9.2p180 :003 > s["f"].encoding
=> #<Encoding:UTF-8>
1.9.2p180 :004 > s["f"].valid_encoding?
=> true
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
=> [226, 128, 176]
So is there something else in my environment or setup that could be causing it to parse incorrectly?
Recently ran into this same problem, and I tracked it down to this Ruby bug caused by the declaration of this buffer in Ruby 1.9.2 and how it gets optimized by GCC. It's fixed in this commit.
You can recompile Ruby with -O0
or use a newer version of Ruby (1.9.3 or better) to fix it.
Try upgrade your JSON Gem (at least to 1.6.6) or newest 1.7.1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With