Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hacking JavaScript Array Into JSON With Python

I am fetching a .js file from a remote site that contains data I want to process as JSON using the simplejson library on my Google App Engine site. The .js file looks like this:

var txns = [
    { apples: '100', oranges: '20', type: 'SELL'}, 
    { apples: '200', oranges: '10', type: 'BUY'}]

I have no control over the format of this file. What I did at first just to hack through it was to chop the "var txns = " bit off of the string and then do a series of .replace(old, new, [count]) on the string until it looked like standard JSON:

cleanJSON = malformedJSON.replace("'", '"').replace('apples:', '"apples":').replace('oranges:', '"oranges":').replace('type:', '"type":').replace('{', '{"transaction":{').replace('}', '}}')

So that it now looks like:

[{ "transaction" : { "apples": "100", "oranges": "20", "type": "SELL"} }, 
 { "transaction" : { "apples": "200", "oranges": "10", "type": "BUY"} }]

How would you tackle this formatting issue? Is there a known way (library, script) to format a JavaScript array into JSON notation?

like image 756
Greg Avatar asked Jan 24 '23 10:01

Greg


1 Answers

It's not too difficult to write your own little parsor for that using PyParsing.

import json
from pyparsing import *

data = """var txns = [
   { apples: '100', oranges: '20', type: 'SELL'}, 
   { apples: '200', oranges: '10', type: 'BUY'}]"""


def js_grammar():
    key = Word(alphas).setResultsName("key")
    value = QuotedString("'").setResultsName("value")
    pair = Group(key + Literal(":").suppress() + value)
    object_ = nestedExpr("{", "}", delimitedList(pair, ","))
    array = nestedExpr("[", "]", delimitedList(object_, ","))
    return array + StringEnd()

JS_GRAMMAR = js_grammar()

def parse(js):
    return JS_GRAMMAR.parseString(js[len("var txns = "):])[0]

def to_dict(object_):
    return dict((p.key, p.value) for p in object_)

result = [
    {"transaction": to_dict(object_)}
    for object_ in parse(data)]
print json.dumps(result)

This is going to print

[{"transaction": {"type": "SELL", "apples": "100", "oranges": "20"}},
 {"transaction": {"type": "BUY", "apples": "200", "oranges": "10"}}]

You can also add the assignment to the grammar itself. Given there are already off-the-shelf parsers for it, you should better use those.

like image 199
Torsten Marek Avatar answered Jan 26 '23 00:01

Torsten Marek