Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby JSON.parse returning incorrect data for unicode

Tags:

json

ruby

I'm trying to parse some JSON containing escaped unicode characters using JSON.parse. But on one machine, using json/ext, it gives back incorrect values. For example, \u2030 should return E2 80 B0 in UTF-8, but instead I'm getting 01 00 00. It fails with either the escaped "\\u2030" or the unescaped "\u2030".

1.9.2p180 :001 > require 'json/ext'
 => true 
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
 => {"f"=>"\u0001\u0000\u0000"} 
1.9.2p180 :003 > s["f"].encoding
 => #<Encoding:UTF-8> 
1.9.2p180 :004 > s["f"].valid_encoding?
 => true 
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
 => [1, 0, 0] 

It works on my other machine with the same version of ruby and similar environment variables. The Gemfile.lock on both machines is identical, including json (= 1.6.3). It does work with json/pure on both machines.

1.9.2p180 :001 > require 'json/pure'
 => true 
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
 => {"f"=>"‰"} 
1.9.2p180 :003 > s["f"].encoding
 => #<Encoding:UTF-8> 
1.9.2p180 :004 > s["f"].valid_encoding?
 => true
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
 => [226, 128, 176] 

So is there something else in my environment or setup that could be causing it to parse incorrectly?

like image 629
bklimt Avatar asked Apr 19 '12 18:04

bklimt


2 Answers

Recently ran into this same problem, and I tracked it down to this Ruby bug caused by the declaration of this buffer in Ruby 1.9.2 and how it gets optimized by GCC. It's fixed in this commit.

You can recompile Ruby with -O0 or use a newer version of Ruby (1.9.3 or better) to fix it.

like image 196
Michael Pilat Avatar answered Nov 10 '22 13:11

Michael Pilat


Try upgrade your JSON Gem (at least to 1.6.6) or newest 1.7.1.

like image 1
Tom Meinlschmidt Avatar answered Nov 10 '22 13:11

Tom Meinlschmidt