Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using jq, Flatten Arbitrary JSON to Delimiter-Separated Flat Dictionary

Tags:

json

bash

jq

I'm looking to transform JSON using jq to a delimiter-separated and flattened structure.

There have been attempts at this. For example, Flatten nested JSON using jq.

However the solutions on that page fail if the JSON contains arrays. For example, if the JSON is:

{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}

The solution above will fail to transform the above to:

{"a.b.0":1,"x.0.y":2,"x.1.z":3}

In addition, I'm looking for a solution that will also allow for an arbitrary delimiter. For example, suppose the space character is the delimiter. In this case, the result would be:

{"a b 0":1,"x 0 y":2,"x 1 z":3}

I'm looking to have this functionality accessed via a Bash (4.2+) function as is found in CentOS 7, something like this:

flatten_json()
{
    local JSONData="$1"
    # jq command to flatten $JSONData, putting the result to stdout
    jq ... <<<"$JSONData"
}

The solution should work with all JSON data types, including null and boolean. For example, consider the following input:

{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}

It should produce:

{"a b 0":"p q r","w 0 x":null,"w 1 y":false,"w 2 z":3}
like image 663
Steve Amerige Avatar asked Feb 17 '17 14:02

Steve Amerige


1 Answers

If you stream the data in, you'll get pairings of paths and values of all leaf values. If not a pair, then a path marking the end of a definition of an object/array at that path. Using leaf_paths as you found would only give you paths to truthy leaf values so you'll miss out on null or even false values. As a stream, you won't get this problem.

There are many ways this could be combined to an object, I'm partial to using reduce and assignment in these situations.

$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}

$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
    .[[$i[0][]|tostring]|join($delim)] = $i[1]
)' input.json
{
  "a.b.0": "p q r",
  "w.0.x": null,
  "w.1.y": false,
  "w.2.z": 3
}

Here's the same solution broken up a bit to allow room for explanation of what's going on.

$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
    [$i[0][]|tostring] as $path_as_strings
        | ($path_as_strings|join($delim)) as $key
        | $i[1] as $value
        | .[$key] = $value
)' input.json

Converting the input to a stream with tostream, we'll receive multiple values of pairs/paths as input to our filter. With this, we can pass those multiple values into reduce which is designed to accept multiple values and do something with them. But before we do, we want to filter those pairs/paths by only the pairs (select(length==2)).

Then in the reduce call, we're starting with a clean object and assigning new values using a key derived from the path and the corresponding value. Remember that every value produced in the reduce call is used for the next value in the iteration. Binding values to variables doesn't change the current context and assignments effectively "modify" the current value (the initial object) and passes it along.

$path_as_strings is just the path which is an array of strings and numbers to just strings. [$i[0][]|tostring] is a shorthand I use as an alternative to using map when the array I want to map is not the current array. This is more compact since the mapping is done as a single expression. That instead of having to do this to get the same result: ($i[0]|map(tostring)). The outer parentheses might not be necessary in general but, it's still two separate filter expressions vs one (and more text).

Then from there we convert that array of strings to the desired key using the provided delimiter. Then assign the appropriate values to the current object.

like image 185
Jeff Mercado Avatar answered Oct 01 '22 07:10

Jeff Mercado