Given the following json:
{
    "README.rst": {
        "_status": {
            "md5": "952ee56fa6ce36c752117e79cc381df8"
        }
    },
    "docs/conf.py": {
        "_status": {
            "md5": "6e9c7d805a1d33f0719b14fe28554ab1"
        }
    }
}
is there a query language that can produce:
{
    "README.rst": "952ee56fa6ce36c752117e79cc381df8",
    "docs/conf.py": "6e9c7d805a1d33f0719b14fe28554ab1",
}
My best attempt so far with JMESPath (http://jmespath.org/) isn't very close:
>>> jmespath.search('*.*.md5[]', db)
['952ee56fa6ce36c752117e79cc381df8', '6e9c7d805a1d33f0719b14fe28554ab1']
I've gotten to the same point with ObjectPath (http://objectpath.org):
>>> t = Tree(db)
>>> list(t.execute('$..md5'))
['952ee56fa6ce36c752117e79cc381df8', '6e9c7d805a1d33f0719b14fe28554ab1']
I couldn't make any sense of JSONiq (do I really need to read a 105 page manual to do this?) This is my first time looking at json query languages..
not sure why you want a query language this is pretty easy
def find_key(data,key="md5"):
    for k,v in data.items():
       if k== key: return v
       if isinstance(v,dict):
          result = find_key(v,key)
          if result:return result
dict((k,find_key(v,"md5")) for k,v in json_result.items()) 
it's even easier if the value dict always has "_status" and "md5" as keys
dict((k,v["_status"]["md5"]) for k,v in json_result.items()) 
alternatively I think you could do something like
t = Tree(db)
>>> dict(zip(t.execute("$."),t.execute('$..md5'))
although I dont know that it would match them up quite right ...
Here is the JSONiq code that does the job:
{|
    for $key in keys($document)
    return {
        $key: $document.$key._status.md5
    }
|}
You can execute it here with the Zorba engine.
If the 105-page manual you mention is the specification, I do not recommend reading it as a JSONiq user. I would rather advise reading tutorials or books online, which give a more gentle introduction.
Do in ObjectPath:
l = op.execute("[keys($.*), $..md5]")
you'll get:
[
  [
    "README.rst",
    "docs/conf.py"
  ],
  [
    "952ee56fa6ce36c752117e79cc381df8",
    "6e9c7d805a1d33f0719b14fe28554ab1"
  ]
]
then in Python:
dict(zip(l[0],l[1]))
to get:
{
    'README.rst': '952ee56fa6ce36c752117e79cc381df8', 
    'docs/conf.py': '6e9c7d805a1d33f0719b14fe28554ab1'
}
Hope that helps. :)
PS. I'm using OPs' keys() to show how to make full query that works anywhere in the document not only when keys are in the root of document.
PS2. I might add new function so that it would look like: object([keys($.*), $..md5]). Shoot me tweet http://twitter.com/adriankal if you want that.
Missed the python requirement, but if you are willing to call external program, this will still work. Please note, that jq >= 1.5 is required for this to work.
# If single "key" $p[0] has multiple md5 keys, this will reduce the array to one key.
cat /tmp/test.json | \
jq-1.5 '[paths(has("md5")?) as $p | { ($p[0]): getpath($p)["md5"]}] | add '
# this will not create single object, but you'll see all key, md5 combinations
cat /tmp/test.json | \
jq-1.5 '[paths(has("md5")?) as $p | { ($p[0]): getpath($p)["md5"]}] '
Get paths with "md5"-key '?'=ignore errors (like testing scalar for key). From resulting paths ($p) filter and surround result with '{}' = object. And then those are in an array ([] surrounding the whole expression) which is then "added/merged" together |add
https://stedolan.github.io/jq/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With