Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse invalid JSON with unquoted keys using ActiveSupport 3 (Rails)

I need to parse certain invalid JSON in Ruby.

Something like:

json_str = '{name:"Javier"}'
ActiveSupport::JSON.decode json_str

As you can see, it's invalid because the hash key is not quoted, it should be

json_str = '{"name":"Javier"}'

But that can't be changed and I have to parse the keys unquoted.

I could parse it with ActiveSupport 2.x, but ActiveSupport 3 doesn't allow me. It throws me:

Yajl::ParseError: lexical error: invalid string in json text.
                                      {name:"Javier"}
                     (right here) ------^

By the way, it's a Ruby application using some Rails libraries, but it's not a Rails application

Thanks in advance

like image 845
Javier Fonseca Avatar asked Feb 03 '11 17:02

Javier Fonseca


2 Answers

I would use a regular expression to fix this invalid JSON:

json_str = '{name:"Javier"}'
json_str.gsub!(/(['"])?([a-zA-Z0-9_]+)(['"])?:/, '"\2":')
hash = Yajl::Parser.parse(json_str)
like image 89
jmonteiro Avatar answered Nov 15 '22 06:11

jmonteiro


Here's a somewhat robust regex you can use. It's not perfect -- specifically it doesn't work in some corner cases where the values themselves contain json-like text, but it will work in most general cases:

quoted_json = unquoted_json.gsub(/([{,]\s*)(\w+)(\s*:\s*["\d])/, '\1"\2"\3')

First it looks for either a { or , which are the options for the character preceding a key name (also allows any amount of whitespace with \s*). It captures this as a group:

([{,]\s*)

Then it captures the key itself, which is composed of letters, digits, and underscores (which regex conveniently supplies a \w character class for):

(\w+)

Finally, it matches what must follow a key name; i.e. a colon followed by either a start quote (for a string value) or a digit (for a numeric value). Also allows extra whitespace, and captures the whole thing in a group:

(\s*:\s*["\d])

For each match, it just puts the three pieces back together, but with quotes around the key (so quotes around capture group #2):

'\1"\2"\3'
like image 39
Ben Lee Avatar answered Nov 15 '22 04:11

Ben Lee