I'm looking to transform JSON using jq
to a delimiter-separated and flattened structure.
There have been attempts at this. For example, Flatten nested JSON using jq.
However the solutions on that page fail if the JSON contains arrays. For example, if the JSON is:
{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}
The solution above will fail to transform the above to:
{"a.b.0":1,"x.0.y":2,"x.1.z":3}
In addition, I'm looking for a solution that will also allow for an arbitrary delimiter. For example, suppose the space character is the delimiter. In this case, the result would be:
{"a b 0":1,"x 0 y":2,"x 1 z":3}
I'm looking to have this functionality accessed via a Bash (4.2+) function as is found in CentOS 7, something like this:
flatten_json()
{
local JSONData="$1"
# jq command to flatten $JSONData, putting the result to stdout
jq ... <<<"$JSONData"
}
The solution should work with all JSON data types, including null and boolean. For example, consider the following input:
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
It should produce:
{"a b 0":"p q r","w 0 x":null,"w 1 y":false,"w 2 z":3}
If you stream the data in, you'll get pairings of paths and values of all leaf values. If not a pair, then a path marking the end of a definition of an object/array at that path. Using leaf_paths
as you found would only give you paths to truthy leaf values so you'll miss out on null
or even false
values. As a stream, you won't get this problem.
There are many ways this could be combined to an object, I'm partial to using reduce
and assignment in these situations.
$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
.[[$i[0][]|tostring]|join($delim)] = $i[1]
)' input.json
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
Here's the same solution broken up a bit to allow room for explanation of what's going on.
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
[$i[0][]|tostring] as $path_as_strings
| ($path_as_strings|join($delim)) as $key
| $i[1] as $value
| .[$key] = $value
)' input.json
Converting the input to a stream with tostream
, we'll receive multiple values of pairs/paths as input to our filter. With this, we can pass those multiple values into reduce
which is designed to accept multiple values and do something with them. But before we do, we want to filter those pairs/paths by only the pairs (select(length==2)
).
Then in the reduce call, we're starting with a clean object and assigning new values using a key derived from the path and the corresponding value. Remember that every value produced in the reduce
call is used for the next value in the iteration. Binding values to variables doesn't change the current context and assignments effectively "modify" the current value (the initial object) and passes it along.
$path_as_strings
is just the path which is an array of strings and numbers to just strings. [$i[0][]|tostring]
is a shorthand I use as an alternative to using map
when the array I want to map is not the current array. This is more compact since the mapping is done as a single expression. That instead of having to do this to get the same result: ($i[0]|map(tostring))
. The outer parentheses might not be necessary in general but, it's still two separate filter expressions vs one (and more text).
Then from there we convert that array of strings to the desired key using the provided delimiter. Then assign the appropriate values to the current object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With