I'd like to use jq
to do a left join on these two objects in an array by using the jq
JOIN operator:
[
{
"type": "spreadsheet",
"rows": [
[
"1",
"Ben",
"male"
],
[
"2",
"Cathy",
"female"
]
],
"columns": [
"id",
"name",
"sex"
]
},
{
"type": "spreadsheet",
"rows": [
[
"1",
"25"
],
[
"2",
"28"
]
],
"columns": [
"id",
"age"
]
}
]
to this:
{
"type": "spreadsheet",
"rows": [
[
"1",
"Ben",
"male",
"25"
],
[
"2",
"Cathy",
"female",
"28"
]
],
"columns": [
"id",
"name",
"sex",
"age"
]
}
The jq manual references SQL-style operators like INDEX
and JOIN
but I can't find any examples online of how to use them. Like any join, I need to iterate over the rows in the first object and merge with rows found in the second object based on the id
column.
The following adopts an object-oriented approach to the task, and ignores keys other than "rows" and "columns" in the input.
There are accordingly three steps:
toOO
;left_join
;toDB
)The main program is thus quite short:
map(toOO) | left_join(.id) | toDB
The helper functions, most of which are independently useful, are presented below but should appear before the main program above.
# headers should be an array of strings
def objectify(headers):
. as $in
| reduce range(0; headers|length) as $i ({};
. + {(headers[$i]): $in[$i]} );
def fromObject:
. as $in
| reduce keys_unsorted[] as $k ( {rows:[], columns:[]};
.columns += [$k] | .rows += [$in[$k]] );
# Input: an array of object
# Output: a single object with array-valued keys formed from
# the input object by (blindly) concatenating the values
def merge:
def allkeys: map(keys) | add | unique;
allkeys as $allkeys
| reduce .[] as $in ({};
reduce $allkeys[] as $k (.;
. + {($k): (.[$k] + [$in[$k]]) } ));
# id should be a filter such as .id
# Apply merge but do not replicate id
def merge_by(id):
merge | (id |= .[0] ) ;
# Create an object-representation of the columns/rows database
def toOO:
.columns as $headers
| [.rows[] | objectify($headers)];
def toDB:
map(fromObject)
| merge_by(.columns);
# Input: an array of two arrays, each consisting of objects forming a database.
# "key" specifies the (possibly composite) key to be used to form the
# left-join of the two databases.
# That is, objects are NOT added to the first database.
def left_join(key):
def updateObject(obj):
reduce .[] as $x ([];
if ($x|key) == (obj|key)
then . + [$x + obj] else . + [$x]
end);
reduce .[1][] as $x (.[0]; updateObject($x) );
{
"columns": [
"id",
"name",
"sex",
"age"
],
"rows": [
[
"1",
"Ben",
"male",
"25"
],
[
"2",
"Cathy",
"female",
"28"
]
]
}
JOIN
builtinIf your jq has JOIN/4
, then it can be used to implement left_join/1
as follows:
def left_join(key):
map(INDEX(key))
| [ JOIN( .[1]; .[0][]; key; add) ] ;
In theory, this should be more efficient than the definition given above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With